1 Introduction

One problem that often occurs when working on machine vision applications that require large magnifications is that the depth of field becomes progressively smaller as the magnification increases. Since for regular lenses the depth of field is parallel to the image plane, problems frequently occur if objects that are not parallel to the image plane must be imaged in focus. Section 2 describes some of the applications where this problem occurs.

It is well known that the depth of field can be centered around a world plane that is tilted with respect to the image plane. This can be achieved by tilting the lens with respect to the image plane. This is the well-known Scheimpflug principle, which is described in detail in Sect. 3.

When working with tilt lenses, the fact that the ray angles in the object and image space of a lens may differ becomes crucial. Section 4 describes the principles of geometric optics that cause ray angles to be different. This section also shows that there are four kinds of lenses that are in common use: entocentric lenses (perspective in object and image space, with possibly differing ray angles), image-side telecentric lenses (perspective in object space, orthographic in image space), object-side telecentric (orthographic in object space, perspective in image space), and bilateral telecentric (orthographic in object and image space). Each of these kinds of lenses can be tilted, which obviously results in a different imaging geometry for each lens type.

A review of existing camera models for tilt lenses (cf. Sect. 5) has revealed several important shortcomings. We therefore propose a comprehensive camera model for tilt lenses, consisting of a camera model for each of the four kinds of possible tilt lenses, that removes these shortcomings (see Sect. 7 for the proposed models and Sect. 6 for the regular models on which the models we propose are based).

We will also examine under which conditions the parameters of the proposed models can be determined uniquely and under which conditions degeneracies occur (Sect. 7.2). Furthermore, we will show in Sect. 8 that all types of projective cameras can be regarded as tilt cameras in a natural manner.

The camera model we propose is versatile. We show in Sect. 12 that the model can also be used to describe the rectification geometry of a stereo image pair of a perspective and a telecentric camera.

We also describe an algorithm that is capable of calibrating an arbitrary combination of perspective and telecentric cameras (Sect. 9). Furthermore, we describe two computationally efficient and accurate algorithms to remove the bias from circular control points (Sect. 10). Finally, we test the proposed camera models on many real lenses (Sect. 11).

2 Applications

In cameras that are equipped with regular lenses, the portion of the object space that is imaged in focus is parallel to the image plane. The depth of field, i.e., the range of object distances that appear focused in the image, is proportional to the inverse of the square of the magnification of the lens (Lenhardt 2006, Chapter 4.2.13.2). As usual, magnification is defined as the ratio of the size of the image of an object to the size of the object, where size denotes the distance to the optical axis (Lenhardt 2006, Chapter 4.2.3). Consequently, the larger the magnification of the camera, the smaller the depth of field is.

The small depth of field at high magnifications becomes problematic whenever it is necessary to image objects in focus that lie in or close to a plane that is not parallel to the image plane. With regular lenses, this is only possible by reducing the size of the aperture stop (typically, the diaphragm), i.e., by increasing the f-number of the lens. However, there is a limit to this approach for two reasons. First, if the aperture stop is made too small, the image will appear blurred because of diffraction (Lenhardt 2006, Chapters 4.3.4–4.3.7). Second, a small aperture stop causes less light to reach the sensor. Consequently, high-powered illumination is required to achieve reasonable exposure times, especially when images of moving objects must be acquired.

There are numerous practical applications in which a plane in object space that is not parallel to the image plane must be imaged in focus. One example is stereo reconstruction, where typically the cameras are used in a converging setup. As shown in Fig. 1, this setup causes the volume in object space for which both cameras produce a sharp image to be a rhomboid-shaped infinite prism. This problem is typically ignored at small magnifications because the common depth of field is large enough. For large magnifications (e.g., larger than 0.1), however, the volume is small enough to cause significant defocus.

Fig. 1
figure 1

A regular stereo setup with converging cameras. The image planes are visualized by thick solid lines. The depth of field of the two cameras is visualized by dashed lines. The common depth of field is visualized by the gray rhombus. The angles of the two cameras are exaggerated to display the common depth of field more clearly. The surface to be reconstructed is visualized by a thick solid line

One method to cause the volumes in object space that are imaged in focus to become parallel is to use shift lenses, i.e., lenses that are shifted with respect to the sensor, while the optical axis remains perpendicular to the sensor (Willert 1997; Prasad 2000). One major problem is that this approach requires imaging obliquely through the lens, which increases vignetting and optical aberrations and thus decreases image quality (Willert 1997; Prasad 2000). In contrast, using the converging stereo configuration avoids these problems since the rays pass through the lens at angles close and symmetric to the optical axis, i.e., in the area for which lenses are designed to have little aberrations. Furthermore, shifting the lens is impossible if telecentric lenses are used in the stereo setup. Here, obviously, a converging geometry must be used to have any parallax in the images.

Another application where a tilted object plane must be imaged in focus is sheet-of-light 3D reconstruction. Here, a laser with a cylindrical lens projects a laser line onto objects in the world and a camera acquires images of the reflections of the laser line (Gross 2005, Chapter 16.6.1; Beyerer et al. 2016, Chapters 7.3.1 and 7.3.2; Li et al. 2007; Legarda et al. 2011). The projection of the laser line forms a plane in space that is not perpendicular to the optical axis of the camera. Different object distances cause different displacements of the laser line in the image, which allow a 3D reconstruction of the scene. To obtain maximum accuracy, it is necessary that the laser line is in focus for all 3D depths that must be reconstructed, i.e., the entire 3D laser plane emitted by the projector should ideally be in focus. In particle image velocimetry applications, the laser sheet approach is combined with a stereo camera setup (Willert 1997; Prasad 2000). This allows the out-of-plane motion component of the velocity vector of the particles to be determined.

Another application where it is important to image a plane in focus that is tilted with respect to the image plane is fringe projection (Gross 2005, Chapter 16.6.2; Beyerer et al. 2016, Chapter 7.3.4; Albers et al. 2015; Peng et al. 2015). Here, a 2D projector replaces one of the cameras of a stereo camera setup. Consequently, this application is geometrically equivalent to the stereo camera setup described above.

3 Tilt Lenses and the Scheimpflug Principle

It is well known that an arbitrary plane in object space can be imaged in focus by tilting the lens with respect to the image plane (Gross 2005, Chapter 10.5.2). This principle is traditionally credited to Theodor Scheimpflug, who filed a series of Austrian, British, and US patents on the subject in 1902–1904. The patents were granted in 1904–1906 (Scheimpflug 1902a, b, c, 1903a, b, c, 1904). However, the principle was already known to Jules Carpentier, who filed for and received a British patent that describes the principle in 1901 (Carpentier 1901). To Scheimpflug’s credit, he worked out the optical and mathematical principles in detail in his patents, whereas Carpentier simply stated the Scheimpflug principle without any proof.

If a thin lens model is assumed, the Scheimpflug principle states the following:Footnote 1 The object plane (the plane that is in focus), the thin lens’s plane, and the image plane must all meet in a single line. This version of the Scheimpflug principle appears in (Carpentier 1901). The line of intersection is often called the Scheimpflug line (Evens 2008a, b; Merklinger 2010), while Scheimpflug calls it the axis of collineation. In his original Austrian patents, Scheimpflug uses a thick lens model,Footnote 2 for which the condition must be modified as follows: the Scheimpflug line is split into two lines, one in each principal plane of the lens, that are conjugate to each other, i.e., have the same distance and orientation with respect to the principal points of the lens (see Fig. 2). The angles of the object and image planes with respect to the principal planes can be derived from the lens equation and are given by (Scheimpflug 1904):

$$\begin{aligned} \tan \tau ' = \frac{f}{a-f} \tan \tau , \end{aligned}$$
(1)

where f is the focal length of the lens, \(\tau \) is the angle of the object plane with respect to the object-side principal plane, \(\tau '\) is the angle of the image plane with respect to the image-side principal plane, and a is the distance of the intersection point of the optical axis with the object plane from the object-side principal point.

Fig. 2
figure 2

The Scheimpflug principle. S and \(S'\) are the object-side and image-side Scheimpflug lines

Fig. 3
figure 3

Refocusing by moving the image plane with respect to the principal plane \(P'\) from \( IP _1\) to \( IP _2\) rotates the object plane that is in focus from \( OP _1\) to \( OP _2\) around the hinge line H

If a plane that is parallel to the image plane is drawn through the object-side principal point, and this plane is intersected with the object-side focal plane, a straight line is obtained (see Fig. 3). This construction can also be performed analogously on the image side of the lens. Scheimpflug calls these lines the counter axes, while in current publications they are called hinge lines (Evens 2008a, b; Merklinger 2010) or pivot lines (Wheeler 2003). The object-side hinge line H has an important geometric significance: if the image is refocused (either by changing the distance of the image plane with respect to the image-side principal point or by changing the focus setting on the lens, which effectively moves the location of the principal planes), the object plane will rotate around the object-side hinge line if the tilt of the image plane remains fixed (Wheeler 2003; Evens 2008a, b; Merklinger 2010). Furthermore, the depth of field is an infinite wedge-shaped region that has the hinge line as its edge (Wheeler 2003; Evens 2008a, b; Merklinger 2010). This can also be seen from Fig. 3. If we interpret the image planes \( IP _1\) and \( IP _2\) as the limits of the depth of focus, the limits of the depth of field are given by \( OP _1\) and \( OP _2\).Footnote 3 Note that positioning the image plane parallel to the principal planes moves the hinge and Scheimpflug lines to infinity, and produces the regular depth-of-field geometry.

With the Scheimpflug principle, it is obvious how to solve the focusing problems in the above applications. For example, for stereo cameras, the image planes must be tilted as shown in Fig. 4. A similar principle holds for fringe projection systems. For sheet-of-light systems, the object plane in Fig. 4 corresponds to the laser plane and there is only one camera.

Fig. 4
figure 4

A stereo setup with converging cameras and image planes tilted according to the Scheimpflug principle. The image planes are visualized by thick solid lines. The depth of field of the two cameras is visualized by dashed lines, which emanate from the hinge lines. The common depth of field is visualized by the gray rhombus. The angles of the two cameras are exaggerated to display the common depth of field more clearly. The surface to be reconstructed is visualized by a thick solid line

To construct a camera with a tilted lens, there are several possibilities. One popular option is to construct a special camera housing with the desired tilt angle, to which the lens can be attached. This is typically done for specialized applications for which the effort of constructing the housing is justified, e.g., specialized sheet-of-light or fringe-projection sensors. In these applications, the lens is typically tilted around the vertical or horizontal axis of the image, as required by the application. Another possibility is to use a Scheimpflug adapter that allows an arbitrary lens to be attached to a regular camera. Again, often only horizontal or vertical tilt is supported. The most versatile option is to use lenses that have been designed specifically to be tilted in an arbitrary direction. In machine vision parlance, these lenses are typically called Scheimpflug lenses or Scheimpflug optics. They are available as perspective or telecentric lenses. In the consumer SLR camera market, these lenses are typically called tilt/shift lenses or perspective correction lenses (although, technically, perspective correction only requires that the lens can be shifted). Since the ability to tilt the lens is the important feature for the purposes of this paper, we will call these lenses tilt lenses from now on.

Figure 5 shows examples for a machine vision and an SLR tilt lens. In both cases, one first selects the direction into which the lens should be tilted by rotating the entire lens assembly, except the part that is attached to the camera housing, and then tilts the lens into that direction. Ideally, a camera model would represent these two tilt parameters in a natural manner that is easy to understand for the user.

Fig. 5
figure 5

Examples for a machine vision tilt lens (left) and an SLR tilt lens (right). In both cases, the letter A denotes a lock mechanism that can be unlocked (a screw for the machine vision lens and a lever for the SLR lens). After unlocking, the upper part of the lens can be rotated with respect to the part of the lens that is attached to the camera housing to determine the direction into which the lens is tilted. The letter B denotes the mechanism by which the lens can be tilted (a screw for the machine vision lens and a knob for the SLR lens)

4 Required Principles of Geometric Optics

The pinhole model is in widespread use in machine and computer vision to model the projection that standard perspective lenses perform. It is a convenient model that is an abstraction of the projection that happens in a real lens. To develop a projection model for tilt lenses, the pinhole model is inadequate, as we will discuss in this section.

It is well known from Gaussian optics (Lenhardt 2006, Chapter 4.2.11) that any assembly of lenses can be regarded as a single thick lens. The thick lens model is characterized by certain cardinal elements, whose main purpose is to simplify ray tracing through the lens (Lenhardt 2006, Chapter 4.2.4). The principal planes P and \(P'\) are perpendicular to the optical axis. A ray that enters the object-side principal plane P exits the image-side principal plane \(P'\) at the same position with respect to the optical axis (however, in general not at the same angle).Footnote 4 The principal points are the intersections of the principal planes with the optical axis. The focal points F and \(F'\) lie on the optical axis at distances f and \(f'\) from P and \(P'\). For lenses that have the same medium, e.g., air, on both sides of the lens, f and \(f'\) have the same absolute value.Footnote 5 A ray that is parallel to the optical axis in image space or object space passes through F or \(F'\), respectively. Finally, the nodal points N and \(N'\) lie on the optical axis and have the property that a ray that passes through one nodal point emerges at the other nodal point with the same direction. For lenses that have the same medium on both sides of the lens, the nodal points coincide with the principal points.

Fig. 6
figure 6

The geometry of a thick lens

Figure 6 displays a thick lens with its cardinal elements and three rays that show the three most important principles derived from the above properties (Lenhardt 2006, Chapter 4.2.4):

  • A ray that is parallel to the optical axis in object space passes through the image-side focal point.

  • A ray that is parallel to the optical axis in image space passes through the object-side focal point.

  • A ray that passes through a nodal point exits at the other nodal point with unchanged direction.

If the thick lens model is simplified to the thin lens model, the two principal planes coincide and the two nodal points collapse to the center of the thin lens (Lenhardt 2006, Chapter 4.2.5). While the above three rules remain valid, they simplify. In particular, the third rule simplifies to: a ray that passes through the center of a thin lens does not change its direction. At first glance, this appears to be very similar to the ray geometry of a pinhole camera. It might, therefore, seem obvious to define the projection center of a thin lens as the center of the lens (see, e.g., Hanning 2011, Chapter 2.2). Similarly, for a thick lens, it might seem obvious to define that there are two projection centers: the two nodal points (see, e.g., Mikhail et al. 2001, Chapter 4.3). These definitions mainly would be motivated by the desire to preserve the property of pinhole cameras that the ray directions are identical on both sides of the pinhole.

It is, however, incorrect to regard a thin or thick lens as a pinhole camera. The projection properties of a lens are determined by one essential element of the lens that is neglected by both the thin and thick lens models: the aperture stop. If an object is imaged through a lens, a cone of rays is emitted from the object, passes through the lens, and is imaged sharply in the image plane. The aperture stop is the opening in the lens that restricts the diameter of the cone of rays (Lenhardt 2006, Chapter 4.2.12.2). Thus, it is the analog of the pinhole in a pinhole camera. The virtual image of the aperture stop by all optical elements that come before it (i.e., lie on the object side of the aperture stop) is called the entrance pupil, while the virtual image of the aperture stop by all optical elements that come behind it (i.e., lie on the image side of the aperture stop) is called the exit pupil. Figure 7 shows the cardinal elements of a real system of lenses along with the aperture stop (diaphragm) and the two pupils.

Fig. 7
figure 7

The pupils and cardinal elements of a real system of lenses. AS is the aperture stop (a diaphragm); \( ENP \) is the entrance pupil; \( EXP \) is the exit pupil

The cone of rays emitted by the object seems to pass through the entrance pupil and seems to emerge from the exit pupil. A ray that passes through the center of the aperture stop is called a chief ray. Since it lies in the center of the cone of rays (with respect to the plane of the aperture stop), it is the analog of the ray through the pinhole of a pinhole camera (Lenhardt 2006, Chapter 4.2.12.4). Note that the chief ray also passes through the center of the entrance and exit pupils (because they are the virtual images of the aperture stop).

The placement of the aperture stop is crucial for the projection properties of a lens. The aperture stop essentially acts as a filter that filters out certain rays in object space.

Often, the aperture stop is placed between the two focal points of the lens, resulting in an entocentric lens (Lenhardt 2006, Chapter 4.2.14.1), as shown in Fig. 8. In entocentric lenses, the entrance and exit pupils lie at finite positions and between the object and the image plane. One important point to note here is that the angle of the chief ray in object space in general is different from the angle of the chief ray in image space (Lenhardt 2006, Chapter 4.2.12.3). Because of the third rule above, the angles would only be identical if the aperture stop were located in one of the principal planes.

Fig. 8
figure 8

An entocentric lens. \(O_1\) and \(O_2\) are two different object positions; \( AS \) is the aperture stop; \( IP \) is the image plane. For simplicity, only the chief rays are displayed. The image of \(O_1\) is in focus, while the image of \(O_2\) is blurred. Note that the image of \(O_2\) is larger than the image of \(O_1\)

The aperture stop can also be placed at one of the focal points. If an infinitely small aperture stop is positioned at the image-side focal point, according to the first rule above, only rays parallel to the optical axis in object space can pass the aperture stop, i.e., the lens performs a parallel projection in object space (Lenhardt 2006, Chapter 4.2.14.1). The entrance pupil is the virtual image of the aperture stop and is therefore located at infinity (the exit pupil is at a finite location; in this simplified model, it lies at the aperture stop; in a real lens, there could be some optical elements behind the aperture stop). Because the entrance pupil lies at infinity, these lenses are called object-side telecentric. In real lenses, the aperture stop has finite size to allow enough light to reach the sensor. This does not change the telecentric properties (Lenhardt 2006, Chapter 4.2.14.2). However, the chief ray is replaced by a cone of rays that has the chief ray as its center. This causes a finite depth-of-focus and depth-of-field. An object-side telecentric lens is shown in Fig. 9.

Fig. 9
figure 9

An object-side telecentric lens. \(O_1\) and \(O_2\) are two different object positions; \( AS \) is the aperture stop; \( IP \) is the image plane. For simplicity, only the chief rays are displayed. The image of \(O_1\) is in focus, while the image of \(O_2\) is blurred. Note that the images of \(O_1\) and \(O_2\) have the same size

In an analogous manner, if the aperture stop is positioned at the object-side focal point, by the second rule above, only rays that are parallel to the optical axis in image space can pass the aperture stop. The entrance pupil remains finite, while the exit pupil moves to infinity. These kinds of lenses are called image-side telecentric. Lenses that are designed for digital sensors are often image-side telecentric or nearly so to avoid pixel vignetting (the effect that solid-state sensors are less sensitive to rays that impinge on the sensor non-perpendicularly). An image-side telecentric lens is shown in Fig. 10.

Fig. 10
figure 10

An image-side telecentric lens. \(O_1\) and \(O_2\) are two different object positions; \( AS \) is the aperture stop; \( IP \) is the image plane. For simplicity, only the chief rays are displayed. The image of \(O_1\) is in focus, while the image of \(O_2\) is blurred. Note that the image of \(O_2\) is larger than the image of \(O_1\)

If an object-side telecentric lens is modified by placing a second lens assembly behind the image-side focal point in such a way that the object-side focal point of the second lens coincides with the image-side focal point of the first lens, i.e., if the object-side focal point of the second lens lies in the center of the aperture stop of the first lens, we obtain a bilateral telecentric lens (Lenhardt 2006, Chapter 4.2.14.2). This construction essentially is identical to attaching an image-side telecentric lens to an object-side telecentric lens (with a single aperture stop, of course). In bilateral telecentric lenses, both the entrance and exit pupils are located at infinity. The advantage of bilateral telecentric lenses over object-side telecentric lenses is that the telecentricity is achieved more accurately since the circle of confusion for bilateral telecentric lenses is more symmetric and suffers less from lens aberrations than for object-side telecentric lenses. Figure 11 shows the ray geometry of a bilateral telecentric lens.

Fig. 11
figure 11

A bilateral telecentric lens. \(O_1\) and \(O_2\) are two different object positions; \( AS \) is the aperture stop; \( IP \) is the image plane. For simplicity, only the chief rays are displayed. The image of \(O_1\) is in focus, while the image of \(O_2\) is blurred. Note that the images of \(O_1\) and \(O_2\) have the same size

As a final example, the aperture stop can be placed behind the image-side focal point, as shown in Fig. 12. This causes the entrance pupil to lie in object space. If an object is placed between the entrance pupil and the lens, this configuration has the effect that objects that are closer to the camera appear smaller in the image. The canonical example is an image of a die that is oriented parallel to the image plane. This lens geometry enables five of the six sides of the die to be seen in the image. These types of lenses are typically called hypercentric lenses (Lenhardt 2006, Chapter 4.2.14.1; Gross 2005, Chapter 10.3.9; Beyerer et al. 2016, Chapter 3.4.6).

Fig. 12
figure 12

A hypercentric lens. \(O_1\) and \(O_2\) are two different object positions; \( AS \) is the aperture stop; \( IP \) is the image plane. For simplicity, only the chief rays are displayed. The image of \(O_1\) is in focus, while the image of \(O_2\) is blurred. Note that the image of \(O_2\) is smaller than the image of \(O_1\)

As these examples have shown, simply moving the aperture stop completely changes the projection properties of the lens, even if all the cardinal elements remain unchanged. This shows that neither the nodal points (in the thick lens model) nor the center of the lens (in the thin lens model) can be the projection centers of the lens. The actual projection centers of the lens that explain the projection behaviors that were described above are the centers of the entrance and exit pupils (Lenhardt 2006, Chapter 4.2.12.4; Luhmann et al. 2014, Chapter 3.3.2.2; Evens 2008b, Section 10). In particular, the center of the entrance pupil is the relevant projection center for many computer and machine vision applications since it determines the ray geometry in object space. For example, the center of the entrance pupil is the point around which a camera must be rotated to create a panorama image (Evens 2008b, Section 10).

Note that all the lens models described above are very different from the pinhole model. First, there are two projection centers. Second, in general the ray angles in image space and object space are different. This is the case even for entocentric lenses. An explanation of how the pinhole model is connected to entocentric lenses is given by Lenhardt (2006, Section 4.2.12.4) and Steger et al. (2008, Chapter 2.2.2). To map the entocentric lens model to the pinhole model, we must ensure that the object and image side ray angles are identical. All the ray angles \(\omega \) in object space must remain fixed since they are determined by the geometry of the objects in the scene. Therefore, the center of the entrance pupil must remain fixed. We can, however, move the exit pupil to the same position as the entrance pupil. Finally, if the image plane is perpendicular to the optical axis, we can move the image plane to a position where the image-side ray angles \(\omega ''\) become identical to the object-side ray angles \(\omega \), while keeping the image size \(y'\) fixed. This is shown in Fig. 13. Note that this construction even works for image-side telecentric lenses.

Fig. 13
figure 13

Modeling an entocentric lens as a pinhole lens. \( ENP \) and \( EXP \) are the entrance and exit pupils with centers Q and \(Q'\). The chief ray enters the entrance pupil at an angle \(\omega \) and exits from the exit pupil at a different angle \(\omega '\). An object of size y is projected to an image of size \(y'\). The distance from the object to the entrance pupil is d, the distance from the exit pupil to the image is \(d'\). To simulate a pinhole lens, \(Q'\) virtually must be moved to Q. To make the image-side ray angle \(\omega ''\) identical to \(\omega \), the image plane must be moved to a distance c (the principal distance) from Q, while keeping \(y'\) fixed. Note that c and \(d'\) differ substantially

For lenses that are telecentric in object space (object-side and bilateral telecentric lenses), we can also pretend that the image space rays have the same angle as the object space rays if the image plane is perpendicular to the optical axis. For bilateral telecentric lenses, this is tautologically true. For object-side telecentric lenses, we can move the image plane to infinity to achieve this. Hence, for both lens types, we can assume that the camera performs an orthographic projection.

If the image plane is tilted with respect to the optical axis, these convenient abstractions unfortunately no longer work correctly. From the examples above, it is obvious that tilting the image plane causes perspective or affine distortions of the image. In this case, it is essential to be able to model the ray geometry in image space correctly to obtain the correct perspective. Let us imagine that the camera is acquiring an image of a square that is perpendicular to the optical axis with a tilted image plane. For an entocentric lens, the image of the square will be a trapezoid.Footnote 6 However, if we do not model the ray geometry in image space correctly, the trapezoid will have the wrong shape.Footnote 7 If we use an image-side telecentric lens, the image of the square will be a rectangle. Therefore, we can see that while the pinhole model is able to model entocentric and image-side telecentric lenses correctly if the image plane is perpendicular to the optical axis, this clearly is no longer the case for tilt lenses (tilted image planes). Similarly, for bilateral telecentric lenses, the image of the square is a rectangle, while for object-side telecentric lenses, it is a trapezoid. Again, it is evident that the orthographic projection model is inadequate for tilt lenses. From this discussion, it is clear that the projection center that is relevant for modeling the tilt correctly is the center of the exit pupil and its geometric relation to the image plane.

We close this section by mentioning that the ray angles \(\omega \) and \(\omega '\) (cf. Fig. 13) are related to the pupil magnification factor \(m_{\mathrm {p}}\) by (see Lenhardt 2006, Chapter 4.2.12.3)

$$\begin{aligned} \frac{\tan \omega }{\tan \omega '} = m_{\mathrm {p}} , \end{aligned}$$
(2)

where \(m_{\mathrm {p}}\) is the diameter of the exit pupil divided by the diameter of the entrance pupil. Consequently, whenever the entrance and exit pupils have different sizes, the ray angles in object and image space will be different. Therefore, by simply looking at the lens from the front and back, we can easily determine whether the ray angles are different.

5 Related Work

5.1 Related Work on the Calibration of Cameras with Tilt Lenses

In this section, we will discuss previous approaches to the calibration of cameras with tilt lenses. We will only discuss approaches that provide an explicit geometric camera model and that calibrate the model parameters. For example, we will not discuss approaches that only compute a rectification from the image plane to the laser plane (Willert 1997; Walker 2002; Konrath and Schröder 2000, 2002), or that do not describe how the camera parameters are calibrated (Fournel et al. 2003).

A review of existing approaches has shown that there are problems that occur frequently. To avoid tedious repetitions, rather than discussing the existing approaches individually, we discuss the problems and list the approaches together with the problems. Every existing approach discussed below exhibits at least two of the problems.

It is interesting to note that there is no single approach that can handle all the lens types that are relevant for machine vision applications (perspective, image-side telecentric, object-side telecentric, and bilateral telecentric, as discussed in Sect. 4). Existing approaches always deal with a single lens type. Perspective lenses are treated in (Gerdes et al. 1993a, b; Gennery 2001, 2006; Fournel et al. 2004, 2006; Louhichi et al. 2006, 2007; Haig et al. 2006; Li et al. 2007; Wang et al. 2008; Legarda et al. 2011, 2013; Hamrouni et al. 2012; Astarita 2012; Cornic et al. 2015; Fasogbon et al. 2015; Albers et al. 2015; Kumar and Ahuja 2014a, b, 2015) and object-side telecentric lenses in (Peng et al. 2015). There are no approaches for image-side and bilateral telecentric lenses.

As discussed in Sect. 3, a lens can typically be tilted in an arbitrary direction (see also Walker 2002). Yet, there are approaches that only model a lens tilt in one direction. For some approaches (Fournel et al. 2006; Louhichi et al. 2006, 2007; Astarita 2012), it is unclear in which direction the tilt is modeled, but there is only a single tilt parameter, which precludes tilting in an arbitrary direction. Furthermore, Li et al. (2007) only model a tilt around the x-axis, while Haig et al. (2006) only model a tilt in the direction of gravity.

More importantly, almost all of the approaches that handle perspective lenses do not model different ray angles in object and image space (Gerdes et al. 1993a, b; Gennery 2001, 2006; Fournel et al. 2004, 2006; Louhichi et al. 2006, 2007; Haig et al. 2006; Li et al. 2007; Wang et al. 2008; Legarda et al. 2011, 2013; Hamrouni et al. 2012; Astarita 2012; Kumar and Ahuja 2014b; Cornic et al. 2015; Fasogbon et al. 2015; Albers et al. 2015). The only exception is the approach by Kumar and Ahuja (2014a, 2015).Footnote 8 Their camera model, however, is unnecessarily complex and contains redundant parameters. In their approach, the tilt and differing ray angles are modeled by at least four parameters, whereas three are sufficient, as will be described in Sect. 7. Furthermore, the model requires data to be specified that few lens manufacturers provide, such as the distances from the two principal planes to the respective pupils. This makes it impractical to use for typical machine vision users.

As will be shown in this paper, the model of the mapping of tilted lenses includes important degeneracies that imply that certain parameters of the model cannot be determined uniquely. These degeneracies are not analyzed by any existing approach (Gerdes et al. 1993a, b; Gennery 2001, 2006; Fournel et al. 2004, 2006; Louhichi et al. 2006, 2007; Haig et al. 2006; Li et al. 2007; Wang et al. 2008; Legarda et al. 2011, 2013; Hamrouni et al. 2012; Astarita 2012; Cornic et al. 2015; Fasogbon et al. 2015; Albers et al. 2015; Peng et al. 2015; Kumar and Ahuja 2014a, b, 2015).

The failure to model the different ray directions in object and image space together with the failure to exclude certain interior orientation parameters in degenerate cases, which would be necessary to handle the degenerate cases correctly, as shown in Sect. 7, means that most existing camera models can only handle the case of horizontal and vertical tilts without excessive residual errors (Gerdes et al. 1993a, b; Gennery 2001, 2006; Fournel et al. 2004, 2006; Louhichi et al. 2006, 2007; Haig et al. 2006; Li et al. 2007; Wang et al. 2008; Legarda et al. 2011, 2013; Hamrouni et al. 2012; Astarita 2012; Kumar and Ahuja 2014b; Cornic et al. 2015; Fasogbon et al. 2015; Albers et al. 2015). However, in this case the camera parameters will not correspond to the correct values, even if the model returns small residual errors.

Another problem is that many approaches parameterize the tilt mapping by Euler angles (Gerdes et al. 1993a, b; Wang et al. 2008; Legarda et al. 2011, 2013; Hamrouni et al. 2012; Cornic et al. 2015; Fasogbon et al. 2015; Albers et al. 2015; Peng et al. 2015; Kumar and Ahuja 2014a, b, 2015). As can be seen from the example lenses in Sect. 3, this kind of parameterization is unintuitive for the user if the tilt is not horizontal or vertical.

A few of the proposed approaches use a self-calibration method in which the coordinates of the control points of the calibration object are determined along with the camera parameters. It is well known that self-calibration cannot recover the scale of the scene, i.e., of the calibration object (Hartley and Zisserman 2003, Chapters 10.2 and 19.1; Luhmann et al. 2014, Chapter 4.4). This fact is ignored in (Fournel et al. 2006; Louhichi et al. 2006; Hamrouni et al. 2012). It is acknowledged implicitly in (Louhichi et al. 2007) or explicitly in (Fournel et al. 2004; Cornic et al. 2015), but no calibration for the undetermined scale is provided. However, it is known from photogrammetry that some form of measured distances or control points, e.g., a scale bar, are necessary to recover the scale (Luhmann et al. 2014, Chapters 4.4.3, 7.3.1.5, and 7.3.2.1; Haig et al. 2006). Known control points or distances effectively constitute a calibration object. Therefore, we might as well use a calibration object with known control points.

Some of the calibration approaches use circular control points. It is well known that this may lead to biased calibration results if the bias is not taken into account and removed (Heikkilä 2000). Many of the existing approaches exhibit this problem (Gennery et al. 1987; Gennery 2001, 2006; Fournel et al. 2004; Louhichi et al. 2006, 2007; Li et al. 2007; Haig et al. 2006; Legarda et al. 2011; Astarita 2012; Cornic et al. 2015; Albers et al. 2015; Peng et al. 2015). In some approaches, the bias is taken into account (Legarda et al. 2013; Fasogbon et al. 2015), albeit in an inefficient manner by the methods of Datta et al. (2009) or Vo et al. (2011), to be discussed in Sect. 5.2.

Finally, some approaches inherently require a 3D calibration object (Gerdes et al. 1993a, b; Gennery 2001, 2006). This is a disadvantage in many machine vision applications because 3D calibration objects are more expensive to manufacture accurately and they are sometimes more cumbersome to handle for the users.

5.2 Related Work on the Bias Removal for Circular Control Points

It is well known that there are two kinds of bias that may affect the location of the projection of circular control points in the image. First, in an ideal pinhole camera that does not have lens distortions, the projection of the circular control point into the image is an ellipse. The center of the ellipse is typically used as the projected center of the corresponding control point. However, the projection of the center of a circle in 3D in general is not the center of the ellipse in the image (Lenz 1988, Chapter 10.5; Lenz and Fritsch 1990; Ahn et al. 1999; Heikkilä 2000; Mallon and Whelan 2007). Therefore, using the centers of the ellipses introduces a bias in the control point locations in the image that affects the accuracy of the camera calibration. Like Mallon and Whelan (2007), we will call this bias the perspective bias.

Second, if the lens exhibits lens distortions, the projection of a circle in 3D into the image typically is no longer an ellipse. Thus, if the camera calibration assumes an ellipse in the image to perform the extraction of control points, e.g., by fitting an ellipse to the control point’s edges in the image (Heikkilä 1997, Chapter 2; Lanser 1997, Chapter 3.2.1.1) or by using gray value moments (Mallon and Whelan 2007), there will be a further bias that can affect the accuracy of the calibration (Lenz 1988, Chapter 10.5; Lenz and Fritsch 1990; Mallon and Whelan 2007). Like Mallon and Whelan (2007), we will call this bias the distortion bias.

The simplest approach to deal with the bias is to ignore it. As discussed in Sect. 5.1, this is a popular approach.

A slightly more elaborate approach is to require that the circles of the control points must be small enough that the perspective bias (and, consequently, the distortion bias) is sufficiently small (Dold 1996, 1997; Ahn et al. 1999; Mallon and Whelan 2007). Furthermore, Dold (1997) and Ahn et al. (1999) claim that, while the perspective bias negatively affects the interior and exterior orientation of the cameras in the calibrated setup, a 3D reconstruction obtained from the biased orientation parameters is largely unaffected by the bias. It was later shown by Otepka (2004) that this assertion is not true in general.

The approaches by Fournel et al. (2006) and Hamrouni et al. (2012), mentioned in Sect. 5.1, use the intersection points of bitangents to the circular marks as control points. While these points are invariant to projection, they are not invariant to lens distortion, i.e., some bias remains.

An elegant solution to remove the perspective bias was proposed in (Heikkilä 1997, Chapter 3.5; Heikkilä 2000). The camera is first calibrated using the biased ellipse centers. Then, the calibrated camera parameters are used to determine the ellipse centers by projecting the control point borders into the image and determining their centers analytically. Furthermore, the projection of the center of the control point is determined analytically. These two points define an offset that is used to correct the extracted ellipse center and then to perform the calibration again using the thus corrected control points. For self-calibration, a similar approach is proposed by (Otepka 2004; Otepka and Fraser 2004). Since the distortion bias is not handled by these approaches, it can be expected that some bias remains in the calibrated camera parameters.

We now turn to approaches that address the perspective as well as the distortion bias.

The effect of perspective and distortion bias is analyzed in (Lenz 1988, Chapter 10.5; Lenz and Fritsch 1990). Approximate formulas (based on several simplifications of the problem) are given that allow to predict the two kinds of bias if the interior and exterior orientation of the camera are known. Lenz (1988, Chapter 10.5) describes an approach to reduce the bias. The camera is first calibrated with the biased control point locations. Then, the approximations to the two kinds of bias are predicted and are applied to the control point locations. No analysis of how the accuracy of the camera parameters is improved by this approach is performed. Since the two kinds of bias are treated only in an approximate manner, it can be assumed that some residual bias is still present in the camera parameters.

The approach by Douxchamps and Chihara (2009) follows the same general steps that the above approaches use (iteration of the camera calibration and the improvement of the control point positions). The improvement of the control point locations is performed by computing a synthetic template of the control point based on the current orientation parameters and then maximizing the correlation of the template and the image around the current estimate of the control point location. The computation of the synthetic template is based on a kind of ray tracing algorithm and therefore is computationally very expensive. This makes the approach unattractive for time-critical applications.

Finally, the approaches by Datta et al. (2009) and Vo et al. (2011) also follow the general scheme of iterating the camera calibration and the improving the control point positions. The difference is that the control point correction is performed by removing the lens distortions from each calibration image and projecting it into a plane that is parallel to the image plane using the current interior and exterior orientation parameters. The positions of the control points are then extracted in the rectified images and are projected into the image. The control points are extracted either through ellipse fitting (Datta et al. 2009) or through template matching (Datta et al. 2009; Vo et al. 2011). Since these approaches rely on rectifying all calibration images once per outer iteration and then performing the control point extraction anew for each rectified image, they are very time-consuming, and therefore unattractive for time-critical applications.

6 Models for Cameras with Regular Lenses

6.1 Camera Models

The models for cameras with tilt lenses that will be proposed in Sect. 7 are all extensions of models for cameras with regular lenses. Therefore, we will discuss these models first. Our presentation is based on the description in (Steger et al. 2008, Chapter 3.9), but extends that description to the multi-view stereo case.

Suppose the multi-view stereo setup consists of \(n_{\mathrm {c}}\) cameras. In the calibration, we use \(n_{\mathrm {o}}\) images of a calibration object in different poses and acquire images of the calibration object with \(n_{\mathrm {c}}\) cameras. The calibration object has its own coordinate system. Each pose \(l \ (l = 1, \ldots , n_{\mathrm {o}})\) of the calibration object defines a transformation from the calibration object coordinate system to a camera coordinate system of camera \(k \ (k = 1, \ldots , n_{\mathrm {c}})\). In applications, often a certain pose of the calibration object is used to define a world coordinate system (Steger 1998, Chapter 3.9.4)

First, a point \({{\varvec{p}}}_{\mathrm {o}} = (x_{\mathrm {o}}, y_{\mathrm {o}}, z_{\mathrm {o}})^\top \) given in the calibration object coordinate system at pose l is transformed into a point \({{\varvec{p}}}_l\) in some reference camera coordinate system using a rigid 3D transformation:

$$\begin{aligned} {{\varvec{p}}}_l = {{{\mathbf {\mathtt{{R}}}}}}_l {{\varvec{p}}}_{\mathrm {o}} + {{\varvec{t}}}_l , \end{aligned}$$
(3)

where \({{\varvec{t}}}_l = (t_{l,x}, t_{l,y}, t_{l,z})^\top \) is a translation vector and \({{{\mathbf {\mathtt{{R}}}}}}_l\) is a rotation matrix that is parameterized by Euler angles: \({{{\mathbf {\mathtt{{R}}}}}}_l = {{{\mathbf {\mathtt{{R}}}}}}_x(\alpha _l) {{{\mathbf {\mathtt{{R}}}}}}_y(\beta _l) {{{\mathbf {\mathtt{{R}}}}}}_z(\gamma _l)\). Without loss of generality, we can assume that the reference camera coordinate system is that of camera 1. As usual, if we represent the points by homogeneous coordinates, we can write the transformation in (3) as a \(4 \times 4\) homogeneous matrix:

$$\begin{aligned} {{\varvec{p}}}_l = {{{\mathbf {\mathtt{{H}}}}}}_l {{\varvec{p}}}_{\mathrm {o}} = \left( \begin{array}{cc} {{{\mathbf {\mathtt{{R}}}}}}_l &{} {{\varvec{t}}}_l \\ {{\varvec{0}}}^\top &{} 1 \end{array} \right) {{\varvec{p}}}_{\mathrm {o}} . \end{aligned}$$
(4)

Next, the point \({{\varvec{p}}}_l\) is transformed into the camera coordinate system of camera k using

$$\begin{aligned} {{\varvec{p}}}_k = {{{\mathbf {\mathtt{{R}}}}}}_k {{\varvec{p}}}_l + {{\varvec{t}}}_k , \end{aligned}$$
(5)

where \({{{\mathbf {\mathtt{{R}}}}}}_k = {{{\mathbf {\mathtt{{R}}}}}}_x(\alpha _k) {{{\mathbf {\mathtt{{R}}}}}}_y(\beta _k) {{{\mathbf {\mathtt{{R}}}}}}_z(\gamma _k)\) and \({{\varvec{t}}}_k = (t_{k,x}, t_{k,y}, t_{k,z})^\top \) describe the relative pose of camera k with respect to camera 1. Again, we can represent this transformation by a homogeneous matrix \({{{\mathbf {\mathtt{{H}}}}}}_k\). Note that, with the above convention, \({{{\mathbf {\mathtt{{R}}}}}}_k = {{{\mathbf {\mathtt{{I}}}}}}\) and \({{\varvec{t}}}_k = {{\varvec{0}}}\) for \(k = 1\). Thus, if we only calibrate a single camera, the transformation in (5) is redundant an can be omitted. We will make use of this fact below to simplify the discussion where appropriate.

As discussed in Sect. 4, the relevant projection center for modeling the pose of the camera is the center of the entrance pupil. Therefore, we assume that the origin of the camera coordinate system is the center of the entrance pupil of the respective camera and that the z axis of the camera coordinate system points along the optical axis towards the scene.

Next, the point \({{\varvec{p}}}_k = (x_k, y_k, z_k)^\top \) is projected into the image plane. For lenses that are perspective in object space (i.e., perspective and image-side telecentric lenses), the projection is given by:

$$\begin{aligned} \left( \begin{array}{c} x_{\mathrm {u}} \\ y_{\mathrm {u}} \end{array} \right) = \frac{c}{z_k} \left( \begin{array}{c} x_k \\ y_k \end{array} \right) , \end{aligned}$$
(6)

where c is the principal distance. Note that we use the assumption that the ray angles in image space and object space are identical, which is a valid assumption since in this section we model the image plane as perpendicular to the optical axis (see the discussion in Sect. 4). If we use homogeneous coordinates, we can represent the projection in (6) by the \(3 \times 4\) matrix

$$\begin{aligned} {{{\mathbf {\mathtt{{P}}}}}}_{\mathrm {p}} = \left( \begin{array}{cccc} c &{} 0 &{} 0 &{} 0 \\ 0 &{} c &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 &{} 0 \end{array} \right) . \end{aligned}$$
(7)

For lenses that are telecentric in object space (i.e., object-side and bilateral telecentric lenses), the projection is given by:

$$\begin{aligned} \left( \begin{array}{c} x_{\mathrm {u}} \\ y_{\mathrm {u}} \end{array} \right) = m \left( \begin{array}{c} x_k \\ y_k \end{array} \right) , \end{aligned}$$
(8)

where m is the magnification of the lens. Again, we use the assumption that the ray angles in image space and object space are identical. If we use homogeneous coordinates, we can represent the projection in (8) by the \(3 \times 4\) matrix

$$\begin{aligned} {{{\mathbf {\mathtt{{P}}}}}}_{\mathrm {o}} = \left( \begin{array}{cccc} m &{} 0 &{} 0 &{} 0 \\ 0 &{} m &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 1 \end{array} \right) . \end{aligned}$$
(9)

Conceptually, the undistorted point \((x_{\mathrm {u}}, y_{\mathrm {u}})^\top \) is then distorted to a point \((x_{\mathrm {d}}, y_{\mathrm {d}})^\top \). In our approach, two distortion models are supported. The first model has been called the division model by Fitzgibbon (2001). It was in common use, however, more than a decade before the term division model was coined (Lenz 1987, 1988; Lenz and Fritsch 1990; Lanser et al. 1995; Lanser 1997; Blahusch et al. 1999). In the division model, the undistorted point \((x_{\mathrm {u}}, y_{\mathrm {u}})^\top \) is computed from the distorted point as follows:

$$\begin{aligned} \left( \begin{array}{c} x_{\mathrm {u}} \\ y_{\mathrm {u}} \end{array} \right) = \frac{1}{1 + \kappa r_{\mathrm {d}}^2} \left( \begin{array}{c} x_{\mathrm {d}} \\ y_{\mathrm {d}} \end{array} \right) , \end{aligned}$$
(10)

where \(r_{\mathrm {d}}^2 = x_{\mathrm {d}}^2 + y_{\mathrm {d}}^2\). The division model has the advantage that it can be inverted analytically. The distorted point can be computed by:

$$\begin{aligned} \left( \begin{array}{c} x_{\mathrm {d}} \\ y_{\mathrm {d}} \end{array} \right) = \frac{2}{1 + \sqrt{1 - 4 \kappa r_{\mathrm {u}}^2}} \left( \begin{array}{c} x_{\mathrm {u}} \\ y_{\mathrm {u}} \end{array} \right) , \end{aligned}$$
(11)

where \(r_{\mathrm {u}}^2 = x_{\mathrm {u}}^2 + y_{\mathrm {u}}^2\). Note that the division model only supports radial distortion.

The second model that is supported is the polynomial model proposed by Brown (1966, 1971), which models radial as well as decentering distortions. The undistorted point is computed by:

$$\begin{aligned} \left( \begin{array}{c} x_{\mathrm {u}} \\ y_{\mathrm {u}} \end{array} \right) =\left( \begin{array}{l} x_{\mathrm {d}} \left( 1 + K_1 r_{\mathrm {d}}^2 + K_2 r_{\mathrm {d}}^4 + K_3 r_{\mathrm {d}}^6\right) \\ \phantom {1ex} + \left( P_1 (r_{\mathrm {d}}^2 + 2 x_{\mathrm {d}}^2) + 2 P_2 x_{\mathrm {d}} y_{\mathrm {d}}\right) \\ y_{\mathrm {d}} \left( 1 + K_1 r_{\mathrm {d}}^2 + K_2 r_{\mathrm {d}}^4 + K_3 r_{\mathrm {d}}^6\right) \\ \phantom {1ex} + \left( 2 P_1 x_{\mathrm {d}} y_{\mathrm {d}} + P_2 (r_{\mathrm {d}}^2 + 2 y_{\mathrm {d}}^2)\right) \end{array} \right) . \end{aligned}$$
(12)

The polynomial model cannot be inverted analytically. The computation of the distorted point from the undistorted point must be performed by a numerical root finding algorithm. This is significantly slower than the analytical inversion in the division model.

In our experience, these two models are accurate enough to handle all lenses that are typically encountered in machine vision applications. However, it would be easy to extend the camera model with more general models that treat distortions as transformations in the image plane, e.g., the rational distortion model (Claus and Fitzgibbon 2005), should the need arise.

Finally, the distorted point \((x_{\mathrm {d}}, y_{\mathrm {d}})^\top \) is transformed into the image coordinate system:

$$\begin{aligned} \left( \begin{array}{c} x_{\mathrm {i}} \\ y_{\mathrm {i}} \end{array} \right) =\left( \begin{array}{c} \displaystyle \frac{x_{\mathrm {d}}}{s_x} + c_x \\ \displaystyle \frac{y_{\mathrm {d}}}{s_y} + c_y \end{array} \right) . \end{aligned}$$
(13)

Here, \(s_x\) and \(s_y\) denote the pixel sizes on the sensor (more accurately: the pixel pitch) and \((c_x, c_y)^\top \) is the principal point. Note that x refers to the horizontal axis of the image (increasing rightward) and y to the vertical axis (increasing downward). If we use homogeneous coordinates, we can represent the transformation in (13) by the \(3 \times 3\) calibration matrix

$$\begin{aligned} {{{\mathbf {\mathtt{{K}}}}}} =\left( \begin{array}{ccc} 1/s_x &{} 0 &{} c_x \\ 0 &{} 1/s_y &{} c_y \\ 0 &{} 0 &{} 1 \end{array} \right) . \end{aligned}$$
(14)

The camera model therefore consists of the following parameters:

  • The six parameters of the exterior orientation (modeling the pose of the calibration objects in the \(n_{\mathrm {o}}\) images): \(\alpha _l, \beta _l, \gamma _l, t_{l,x}, t_{l,y}\), and \(t_{l,z}\).

  • The six parameters of the relative orientation of the \(n_{\mathrm {c}}\) cameras with respect to camera 1: \(\alpha _k, \beta _k, \gamma _k, t_{k,x}, t_{k,y}\), and \(t_{k,z}\).

  • The interior orientation of each camera (for simplicity, we omit the subscripts k here): c or m; \(\kappa \) or \(K_1, K_2, K_3, P_1, P_2\); \(s_x, s_y, c_x\), and \(c_y\).

The above parameterization is very intuitive for machine vision users. All parameters have a physical meaning that is easy to understand. Approximate initial values for the interior orientation parameters can simply be read off the data sheets of the camera (\(s_x, s_y\)) and the lens (c or m) or can be obtained easily otherwise (the initial values for the principal point can be set to the center of the image and the distortion coefficients can typically be set to 0). Furthermore, the calibration results are easy to check for validity.

6.2 Model Degeneracies

Remark 1

The camera models above are overparameterized. We cannot determine c or \(m, s_x\), and \(s_y\) uniquely since they functionally depend on each other. We can solve this problem by leaving \(s_y\) at the initial value the user specifies. This is a valid approach since camera signals are always transmitted line-synchronously, i.e., the acquired image necessarily has the same vertical pixel pitch as the image on the sensor. For \(s_x\), this is not always the case. If the image is transmitted via an analog signal, the image may not be sampled pixel-synchronously by the frame grabber (Steger et al. 2008, Chapter 3.9).

Another method to remove the overparameterization that is frequently used is to move the parameters c and m from the projection equations (6) and (8) to the transformation into the pixel coordinates (13). Therefore, (6) becomes

$$\begin{aligned} \left( \begin{array}{c} x_{\mathrm {u}} \\ y_{\mathrm {u}} \end{array} \right) = \frac{1}{z_k}\left( \begin{array}{c} x_k \\ y_k \end{array} \right) \end{aligned}$$
(15)

with projection matrix

$$\begin{aligned} {{{\mathbf {\mathtt{{P}}}}}}_{\mathrm {p}} = \left( \begin{array}{cccc} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 &{} 0 \end{array} \right) , \end{aligned}$$
(16)

(8) becomes

$$\begin{aligned} \left( \begin{array}{c} x_{\mathrm {u}} \\ y_{\mathrm {u}} \end{array} \right) =\left( \begin{array}{c} x_k \\ y_k \end{array} \right) \end{aligned}$$
(17)

with projection matrix

$$\begin{aligned} {{{\mathbf {\mathtt{{P}}}}}}_{\mathrm {o}} = \left( \begin{array}{cccc} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 1 \end{array} \right) , \end{aligned}$$
(18)

and (13) becomes

$$\begin{aligned} \left( \begin{array}{c} x_{\mathrm {i}} \\ y_{\mathrm {i}} \end{array} \right) =\left( \begin{array}{c} \displaystyle a_x x_{\mathrm {d}} + c_x \\ \displaystyle a_y y_{\mathrm {d}} + c_y \end{array} \right) \end{aligned}$$
(19)

with calibration matrix

$$\begin{aligned} {{{\mathbf {\mathtt{{K}}}}}} =\left( \begin{array}{ccc} a_x &{} 0 &{} c_x \\ 0 &{} a_y &{} c_y \\ 0 &{} 0 &{} 1 \end{array} \right) , \end{aligned}$$
(20)

where, \((a_x, a_y) = (c/s_x, c/s_y)\) for cameras that are perspective in object space and \((a_x, a_y) = (m/s_x, m/s_y)\) for cameras that are telecentric in object space. We have used this kind of parameterization in the past for telecentric cameras (Steger et al. 2008, Chapter 3.9; Blahusch et al. 1999). However, feedback from users has shown us that this mingling of parameters is difficult to use for many users.

The interesting aspect of the above discussion is that we are free to use the parameters c and m in the projection matrices or the calibration matrices according to our needs. This enables us to separate all the parts of the model that are purely image-based transforms from everything that requires the poses and the projection. We will make use of this fact in several proofs below. Note that if we move c and m to the calibration matrix, we must scale the distortion coefficients appropriately. For example, \(\kappa \) and \(K_1\) must be divided by \(c^2\) or \(m^2, K_2\) must be divided by \(c^4\) or \(m^4\), etc.

Remark 2

For lenses that are telecentric in object space, it is impossible to determine the principal point \((c_x, c_y)^\top \) uniquely if there are no distortions. In other words, the principal point is solely determined by the distortions for this kind of lenses. If there are no distortions, the cameras are actually affine cameras in the sense of (Hartley and Zisserman 2003, Chapter 6.3). For these cameras, \((c_x, c_y)\) and \((t_{l,x}, t_{l,y})^\top \) have the same effect. As described in (Hartley and Zisserman 2003, Chapter 6.3.3), the appropriate solution is to leave \((c_x, c_y)^\top \) at the initial values specified by the user (typically, the center of the image).

Remark 3

As described in Sect. 9, we calibrate the cameras using a planar calibration object. For perspective lenses, it is well known that for some configurations of poses of the calibration object, not all parameters of the camera model can be determined simultaneously (Sturm and Maybank 1999). These degeneracies must be taken into account by the user when acquiring images of the calibration object.

Remark 4

For telecentric lenses, the pose parameter \(t_{l,z}\) obviously cannot be determined. We arbitrarily set it to 1 m. In addition, in a multi-view setup, the relative pose parameters \((t_{k,x}, t_{k,y}, t_{k,z})\) cannot be determined uniquely since all cameras can be moved arbitrarily along their optical axes without changing the image geometry. To provide a well-defined relative pose, we move the origins of the camera coordinate systems along the respective optical axes to a sphere with radius 1 m. The center of the sphere is given by a point that lies at a distance of 1 m on the optical axis in front of the reference camera.

Remark 5

For telecentric cameras and planar calibration objects, the rotation part of the pose can only be determined up to a twofold ambiguity from a single camera. For example, a plane rotated by \(\alpha _l = 20^\circ \) looks identical to a plane rotated by \(\alpha _l = -20^\circ \). This essentially is the same as a Necker reversal (Hartley and Zisserman 2003, Chapter 14.6). In a multi-view setup, these individual pose ambiguities can be resolved, albeit only up to an overall Necker reversal, which will have to be resolved manually by the user.

7 Models for Cameras with Tilt Lenses

7.1 Camera Models

We will now extend the camera models of Sect. 6 to handle tilt lenses correctly. To do so, we must model the transformation that occurs when the image plane is tilted for lenses that are telecentric in image space (image-side telecentric and bilateral telecentric lenses) and for lenses that are perspective in image space (perspective and object-side telecentric lenses).

The camera models in Sect. 6 have proven their ability to model standard cameras correctly for many years. Therefore, the tilt camera models should reduce to the standard models if the image plane is not tilted. From the discussion at the end of Sect. 4, we can see that the parameters c and m essentially model the ray angles in object space correctly.Footnote 9 Furthermore, as discussed by Sturm et al. (2010, Page 43), the distortion models discussed above essentially model distortions of ray angles with respect to the optical axis. In the above distortion models, the rays in image space are represented by their intersections with a plane that is perpendicular to the optical axis. This is convenient for untilted image planes, since this plane is already available: it is the image plane. Since the optical axis of the lens is unaffected by a tilt of the image plane, we can still use the above mechanism to represent the distortions, which models the distortions of ray angles with respect to the optical axis by way of their intersections with a plane that is perpendicular to the optical axis. Since the actual image plane is now tilted, the untilted image plane becomes a virtual image plane that is used solely for the purpose of representing ray angles with respect to the optical axis and thus to compute the distortions. Consequently, these two parts of the model, i.e., the modeling of the ray angles in object space by c or m and the modeling of the distortions in a plane that is perpendicular to the optical axis, can remain unchanged.

Both tilt models therefore work by projecting a point from a virtual image plane that is perpendicular to the optical axis to the tilted image plane. Therefore, we first describe how we can model the pose of the tilted image plane in a manner that is easy to understand for the user. As discussed at the end of Sect. 3, almost all tilt lenses work by first selecting the direction in which to tilt the lens and then tilting the lens. The selection of the direction in which to tilt essentially determines a rotation axis \({{\varvec{n}}}\) in a plane orthogonal to the optical axis, i.e., the untilted image plane, and then rotating the image plane around that axis. Let the image coordinate system of the untilted image plane be given by the vectors \({{\varvec{x}}}_{\mathrm {u}}\) and \({{\varvec{y}}}_{\mathrm {u}}\). We can extend this image coordinate system to a 3D coordinate system by the vector \({{\varvec{z}}}_{\mathrm {u}}\), which points back to the scene along the optical axis. Figure 14 displays the untilted image plane and this coordinate system in medium gray. The axis \({{\varvec{n}}}\) around which the image plane is tilted can be parameterized by the angle \(\rho \) (\(0 \le \rho < 2 \pi \)) as follows:

$$\begin{aligned} {{\varvec{n}}} =\left( \begin{array}{c} \cos \rho \\ \sin \rho \\ 0\end{array} \right) . \end{aligned}$$
(21)

If we rotate the coordinate system \(({{\varvec{x}}}_{\mathrm {u}}, {{\varvec{y}}}_{\mathrm {u}}, {{\varvec{z}}}_{\mathrm {u}})\) by the tilt angle \(\tau \) (\(0 \le \tau < \pi /2\)) around \({{\varvec{n}}}\), the coordinate axes \(({{\varvec{x}}}_{\mathrm {t}}, {{\varvec{y}}}_{\mathrm {t}}, {{\varvec{z}}}_{\mathrm {t}})\) of the tilted image plane are given in the coordinate system of the untilted image plane by (see Spong et al. 2006, Chapter 2.5.3)

$$\begin{aligned} {{{\mathbf {\mathtt{{R}}}}}}_{\mathrm {t}}= & {} \left( \begin{array}{ccc} {{\varvec{x}}}_{\mathrm {t}}&{{\varvec{y}}}_{\mathrm {t}}&{{\varvec{z}}}_{\mathrm {t}} \end{array}\right) =\left( \begin{array}{ccc} r_{11} &{} r_{12} &{} r_{13} \\ r_{21} &{} r_{22} &{} r_{23} \\ r_{31} &{} r_{32} &{} r_{33} \end{array} \right) \nonumber \\= & {} \left( \begin{array}{ccc} c_\rho ^2 (1-c_\tau ) + c_\tau &{} c_\rho s_\rho (1-c_\tau ) &{} s_\rho s_\tau \\ c_\rho s_\rho (1-c_\tau ) &{} s_\rho ^2 (1-c_\tau ) + c_\tau &{} -c_\rho s_\tau \\ -s_\rho s_\tau &{} c_\rho s_\tau &{} c_\tau \end{array} \right) , \end{aligned}$$
(22)

where \(c_\theta = \cos \theta \) and \(s_\theta = \sin \theta \). Note that the rotation matrix \({{{\mathbf {\mathtt{{R}}}}}}_{\mathrm {t}}\) also represents a transformation of points from the tilted coordinate system \(({{\varvec{x}}}_{\mathrm {t}}, {{\varvec{y}}}_{\mathrm {t}}, {{\varvec{z}}}_{\mathrm {t}})\) to the untilted coordinate system \(({{\varvec{x}}}_{\mathrm {u}}, {{\varvec{y}}}_{\mathrm {u}}, {{\varvec{z}}}_{\mathrm {u}})\).

Note that the semantics of the tilt parameters are quite easy to understand. A rotation angle \(\rho = 0^\circ \) means that the lens (i.e., the optical axis) is tilted downwards by \(\tau \) with respect to the camera housing, for \(\rho = 90^\circ \), it is tilted leftwards, for \(\rho = 180^\circ \) upwards, and for \(\rho = 270^\circ \) rightwards.

Fig. 14
figure 14

The projection of a point \({{\varvec{p}}}_{\mathrm {u}}\) from the untilted image plane to a point \({{\varvec{p}}}_{\mathrm {t}}\) in the tilted image plane for a camera that is telecentric in image space. The coordinate system of the untilted image plane is given by \(({{\varvec{x}}}_{\mathrm {u}}, {{\varvec{y}}}_{\mathrm {u}}, {{\varvec{z}}}_{\mathrm {u}})\), that of the tilted image plane by \(({{\varvec{x}}}_{\mathrm {t}}, {{\varvec{y}}}_{\mathrm {t}}, {{\varvec{z}}}_{\mathrm {t}})\). The camera’s viewing direction is along the \({{\varvec{z}}}_{\mathrm {u}}\) axis, which points towards the scene. The rotation axis around which the image plane is tilted is given by \({{\varvec{n}}}\), which forms an angle \(\rho \) with \({{\varvec{x}}}_{\mathrm {u}}\). The image plane is tilted by the angle \(\tau \) around the axis \({{\varvec{n}}}\). The direction of the light ray is indicated by the vertical arrow, which goes through the image-side projection center (the exit pupil), located at infinity. Note that the image plane is shown in its correct orientation in the camera: upside-down

To project a point \({{\varvec{p}}}_{\mathrm {u}} = (x_{\mathrm {u}}, y_{\mathrm {u}})^\top \) from the untilted image plane to a point \({{\varvec{p}}}_{\mathrm {t}} = (x_{\mathrm {t}}, y_{\mathrm {t}})^\top \) in the tilted image plane, it is easiest to consider the inverse of this projection, i.e., to compute the orthographic projection from the tilted to the untilted image plane. It is well known that the orthographic projection of one vector onto another vector is given by the scalar product of the two vectors and that a linear transformation is given by projecting the axes of one coordinate system onto the axes of the other coordinate system (see, e.g., Spong et al. 2006, Chapter 2.2). Therefore, to compute the transformation from the tilted image plane to the untilted image plane, we must compute the projections (scalar products) of \({{\varvec{x}}}_{\mathrm {t}}\) and \({{\varvec{y}}}_{\mathrm {t}}\) onto \({{\varvec{x}}}_{\mathrm {u}}\) and \({{\varvec{y}}}_{\mathrm {u}}\). For example, the element \(h_{11}\) of the projection matrix is the projection of \({{\varvec{x}}}_{\mathrm {t}}\) onto \({{\varvec{x}}}_{\mathrm {u}}\). Since \({{\varvec{x}}}_{\mathrm {u}} = (1,0,0)^\top \) and \({{\varvec{x}}}_{\mathrm {t}} = (r_{11}, r_{21}, r_{31})^\top \) in the coordinate system of the untilted image plane, the projection is given by \({{\varvec{x}}}_{\mathrm {t}} \cdot {{\varvec{x}}}_{\mathrm {u}} = r_{11}\). If we do this for all the x and y coordinate system vectors, we see that the inverse tilt transformation is an affine transformation:

$$\begin{aligned} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {o}}^{-1} = \left( \begin{array}{ccc} r_{11} &{} r_{12} &{} 0 \\ r_{21} &{} r_{22} &{} 0 \\ 0 &{} 0 &{} 1 \end{array} \right) . \end{aligned}$$
(23)

Therefore, the transformation from untilted to tilted image coordinates for lenses that are telecentric in image space is given by

$$\begin{aligned} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {o}} = \left( \begin{array}{ccc} \frac{\displaystyle r_{22}}{\displaystyle r_{11} r_{22} - r_{12} r_{21}} &{} -\frac{\displaystyle r_{12}}{\displaystyle r_{11} r_{22} - r_{12} r_{21}} &{} 0 \\ -\frac{\displaystyle r_{21}}{\displaystyle r_{11} r_{22} - r_{12} r_{21}} &{} \frac{\displaystyle r_{11}}{\displaystyle r_{11} r_{22} - r_{12} r_{21}} &{} 0 \\ 0 &{} 0 &{} 1 \end{array} \right) . \end{aligned}$$
(24)

We have \({{\varvec{p}}}_{\mathrm {t}} = {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {o}} {{\varvec{p}}}_{\mathrm {u}}\). We insert this tilt transformation into the camera model between the distortion (10)–(12) and the transformation to the image coordinate system (13). Consequently, the world points are projected to the untilted image plane, distorted within the untilted image plane, transformed to the tilted image plane, and then transformed to the image coordinate system. Note that the tilt transformation is invariant to the choice of units. Consequently, the fact that we can move c or m to the transformation to the image plane, mentioned at the end of Sect. 6, is still valid.

We now turn to cameras with lenses that are perspective in image space. Here, we must be able to model the different ray angles in object and image space correctly. From the discussion in Sect. 4 and Fig. 13, it is evident what must be done to model the different ray angles correctly: we must locate the untilted image plane at the true distance from the center of the exit pupil. This distance was called \(d'\) in Fig. 13. For simplicity, we will call it d from now on. We will refer to d as the image plane distance. We require \(0< d < \infty \). Figure 15 displays this geometry. Points in object space are first projected to a virtual image plane that is orthogonal and lies at a distance c from the projection center O. Obviously, this causes the object and image space ray angles to be identical (\(\omega '' = \omega \)) and therefore causes the object space ray angles \(\omega \) to be modeled correctly. To model the image space ray angles correctly, the virtual image plane is shifted to a distance d (which corresponds to \(d'\) in Fig. 13), resulting in the correct image space ray angles \(\omega ' \ne \omega \).Footnote 10 Note that this shift does not change the virtual image in any way. Next, the points are distorted in the virtual image plane. With the virtual image at its correct distance d, the plane can now be tilted by the correct tilt angle \(\tau \).

Fig. 15
figure 15

The ray geometry of the perspective tilt camera model. Points in object space are first projected to a virtual image plane that is orthogonal and lies at a distance c from the projection center O. This causes the object and image space ray angles to be identical (\(\omega '' = \omega \)) and therefore causes the object space ray angles \(\omega \) to be modeled correctly. To model the image space ray angles correctly, the virtual image plane is shifted to a distance d (which corresponds to \(d'\) in Fig. 13), resulting in the correct image space ray angles \(\omega ' \ne \omega \). Next, the points are distorted in the virtual image plane. With the virtual image at its correct distance d, the plane can now be tilted by the correct tilt angle \(\tau \)

With this, the major difference to the telecentric model, as shown in Fig. 16, is that the image-side projection center (the center of the exit pupil) lies at a finite distance d in front of the image plane.

Fig. 16
figure 16

The projection of a point \({{\varvec{p}}}_{\mathrm {u}}\) from the untilted image plane to a point \({{\varvec{p}}}_{\mathrm {t}}\) in the tilted image plane for a camera that is perspective in image space. The coordinate system of the untilted image plane is given by \(({{\varvec{x}}}_{\mathrm {u}}, {{\varvec{y}}}_{\mathrm {u}}, {{\varvec{z}}}_{\mathrm {u}})\), that of the tilted image plane by \(({{\varvec{x}}}_{\mathrm {t}}, {{\varvec{y}}}_{\mathrm {t}}, {{\varvec{z}}}_{\mathrm {t}})\). The camera’s viewing direction is along the \({{\varvec{z}}}_{\mathrm {u}}\) axis, which points towards the scene. The rotation axis around which the image plane is tilted is given by \({{\varvec{n}}}\), which forms an angle \(\rho \) with \({{\varvec{x}}}_{\mathrm {u}}\). The image plane is tilted by the angle \(\tau \) around the axis \({{\varvec{n}}}\). These two coordinate systems can also be attached to the image-side projection center (the center of the exit pupil) as \(({{\varvec{x}}}'_{\mathrm {u}}, {{\varvec{y}}}'_{\mathrm {u}}, {{\varvec{z}}}'_{\mathrm {u}})\) and \(({{\varvec{x}}}'_{\mathrm {t}}, {{\varvec{y}}}'_{\mathrm {t}}, {{\varvec{z}}}'_{\mathrm {t}})\). The distance from the projection center to the intersection of the optical axis with the untilted and tilted image planes is d. The coordinate system \(({{\varvec{x}}}_{\mathrm {s}}, {{\varvec{y}}}_{\mathrm {s}}, {{\varvec{z}}}_{\mathrm {s}})\) lies in the tilted image plane at the perpendicular projection of the projection center. It lies at the distance \(d_{\mathrm {s}}\) from the projection center, has the same orientation as \(({{\varvec{x}}}_{\mathrm {t}}, {{\varvec{y}}}_{\mathrm {t}}, {{\varvec{z}}}_{\mathrm {t}})\), and is offset from this coordinate system by the vector \({{\varvec{o}}}\). Note that the image plane is shown in its correct orientation in the camera: upside-down

As shown in Fig. 16, the coordinate systems \(({{\varvec{x}}}_{\mathrm {u}}, {{\varvec{y}}}_{\mathrm {u}}, {{\varvec{z}}}_{\mathrm {u}})\) and \(({{\varvec{x}}}_{\mathrm {t}}, {{\varvec{y}}}_{\mathrm {t}}, {{\varvec{z}}}_{\mathrm {t}})\) have the same meaning as in the telecentric case. If we shift the two coordinate systems to the projection center and construct another coordinate system \(({{\varvec{x}}}_{\mathrm {s}}, {{\varvec{y}}}_{\mathrm {s}}, {{\varvec{z}}}_{\mathrm {s}})\) in the tilted image plane that is parallel to \(({{\varvec{x}}}_{\mathrm {t}}, {{\varvec{y}}}_{\mathrm {t}}, {{\varvec{z}}}_{\mathrm {t}})\) and lies at the perpendicular projection of the projection center into the tilted image plane, we see that \(({{\varvec{x}}}_{\mathrm {u}}, {{\varvec{y}}}_{\mathrm {u}}, {{\varvec{z}}}_{\mathrm {u}})\) and \(({{\varvec{x}}}_{\mathrm {s}}, {{\varvec{y}}}_{\mathrm {s}}, {{\varvec{z}}}_{\mathrm {s}})\) are equivalent to two perspective cameras that are rotated around their common projection center. Therefore, we can construct two calibration matrices \({{{\mathbf {\mathtt{{K}}}}}}_{\mathrm {u}}\) and \({{{\mathbf {\mathtt{{K}}}}}}_{\mathrm {s}}\) and a rotation matrix \({{{\mathbf {\mathtt{{R}}}}}}\) that relate the two cameras. Then, the projection from \(({{\varvec{x}}}_{\mathrm {u}}, {{\varvec{y}}}_{\mathrm {u}}, {{\varvec{z}}}_{\mathrm {u}})\) to \(({{\varvec{x}}}_{\mathrm {s}}, {{\varvec{y}}}_{\mathrm {s}}, {{\varvec{z}}}_{\mathrm {s}})\) is given by \({{{\mathbf {\mathtt{{K}}}}}}_{\mathrm {s}} {{{\mathbf {\mathtt{{R}}}}}} {{{\mathbf {\mathtt{{K}}}}}}_{\mathrm {u}}^{-1}\) (Hartley and Zisserman 2003, Chapter 8.4.2). Finally, we must shift \(({{\varvec{x}}}_{\mathrm {s}}, {{\varvec{y}}}_{\mathrm {s}}, {{\varvec{z}}}_{\mathrm {s}})\) to \(({{\varvec{x}}}_{\mathrm {t}}, {{\varvec{y}}}_{\mathrm {t}}, {{\varvec{z}}}_{\mathrm {t}})\) by a translation \({{\varvec{o}}}\) within the tilted image plane. This translation can be modeled by a translation matrix \({{{\mathbf {\mathtt{{T}}}}}}\). Thus, the complete projection is given by

$$\begin{aligned} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {p}} = {{{\mathbf {\mathtt{{T}}}}}} {{{\mathbf {\mathtt{{K}}}}}}_{\mathrm {s}} {{{\mathbf {\mathtt{{R}}}}}} {{{\mathbf {\mathtt{{K}}}}}}_{\mathrm {u}}^{-1} . \end{aligned}$$
(25)

The calibration matrix \({{{\mathbf {\mathtt{{K}}}}}}_{\mathrm {u}}\) is obviously given by

$$\begin{aligned} {{{\mathbf {\mathtt{{K}}}}}}_{\mathrm {u}} = \left( \begin{array}{ccc} d &{} 0 &{} 0 \\ 0 &{} d &{} 0 \\ 0 &{} 0 &{} 1 \end{array} \right) . \end{aligned}$$
(26)

The distance \(d_{\mathrm {s}}\) of \(({{\varvec{x}}}_{\mathrm {s}}, {{\varvec{y}}}_{\mathrm {s}}, {{\varvec{z}}}_{\mathrm {s}})\) from the projection center is the orthogonal projection of the vector \({{\varvec{d}}} = (0,0,d)^\top \) onto the axis \({{\varvec{z}}}_{\mathrm {s}} = {{\varvec{z}}}_{\mathrm {t}}\), i.e., \(d_{\mathrm {s}} = {{\varvec{d}}} \cdot {{\varvec{z}}}_{\mathrm {t}} = d r_{33}\), with the rotation matrix \({{{\mathbf {\mathtt{{R}}}}}}_{\mathrm {t}}\) as defined in (22). Therefore, we have

$$\begin{aligned} {{{\mathbf {\mathtt{{K}}}}}}_{\mathrm {s}} = \left( \begin{array}{ccc} d r_{33} &{} 0 &{} 0 \\ 0 &{} d r_{33} &{} 0 \\ 0 &{} 0 &{} 1 \end{array} \right) . \end{aligned}$$
(27)

The rotation matrix \({{{\mathbf {\mathtt{{R}}}}}}\) in (25) must transform points from \(({{\varvec{x}}}_{\mathrm {u}}, {{\varvec{y}}}_{\mathrm {u}}, {{\varvec{z}}}_{\mathrm {u}})\) to points in \(({{\varvec{x}}}_{\mathrm {s}}, {{\varvec{y}}}_{\mathrm {s}}, {{\varvec{z}}}_{\mathrm {s}})\), i.e., in \(({{\varvec{x}}}_{\mathrm {t}}, {{\varvec{y}}}_{\mathrm {t}}, {{\varvec{z}}}_{\mathrm {t}})\). As discussed above, the matrix \({{{\mathbf {\mathtt{{R}}}}}}_{\mathrm {t}}\) in (22) performs the inverse of this transformation. Thus,

$$\begin{aligned} {{{\mathbf {\mathtt{{R}}}}}} = {{{\mathbf {\mathtt{{R}}}}}}_{\mathrm {t}}^\top . \end{aligned}$$
(28)

Finally, the translation vector \({{\varvec{o}}}\) is the negative of the orthogonal projection of \({{\varvec{d}}}\) onto \({{\varvec{x}}}_{\mathrm {t}}\) and \({{\varvec{y}}}_{\mathrm {t}}\), i.e., \({{\varvec{o}}} = - ({{\varvec{d}}} \cdot {{\varvec{x}}}_{\mathrm {t}}, {{\varvec{d}}} \cdot {{\varvec{y}}}_{\mathrm {t}})^\top = - (d r_{31}, d r_{32})^\top \). Therefore,

$$\begin{aligned} {{{\mathbf {\mathtt{{T}}}}}} = \left( \begin{array}{ccc} 1 &{} 0 &{} -d r_{31} \\ 0 &{} 1 &{} -d r_{32} \\ 0 &{} 0 &{} 1 \end{array} \right) . \end{aligned}$$
(29)

By Substituting (26)–(29) into (25), we obtain

$$\begin{aligned} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {p}} =\left( \begin{array}{ccc} r_{11} r_{33} - r_{13} r_{31} &{} r_{21} r_{33} - r_{23} r_{31} &{}0 \\ r_{12} r_{33} - r_{13} r_{32} &{} r_{22} r_{33} - r_{23} r_{32} &{} 0 \\ r_{13} / d &{} r_{23} / d &{} r_{33} \end{array} \right) . \end{aligned}$$
(30)

As above, we insert this tilt transformation into the camera model between the distortion (10)–(12) and the transformation to the image coordinate system (13).

7.2 Model Properties and Degeneracies

Proposition 1

The tilt camera models reduce to the standard camera models if the image plane is not tilted.

Proof

Inserting \(\tau = 0\) into (22), we obtain \({{{\mathbf {\mathtt{{R}}}}}}_{\mathrm {t}} = {{{\mathbf {\mathtt{{I}}}}}}\). Therefore, \({{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {o}} = {{{\mathbf {\mathtt{{I}}}}}}\) and \({{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {p}} = {{{\mathbf {\mathtt{{I}}}}}}\). \(\square \)

Corollary 1

For lenses that are perspective in image space, the image plane distance d cannot be determined if the image plane is not tilted.

Proof

This is obvious from Proposition 1 and the discussion in Sect. 4. This also means that the user should select the standard perspective camera model if there is no tilt. \(\square \)

Remark 6

A consequence of Corollary 1 is that the smaller \(\tau \) is, the less precisely d can be determined. If the calibration has converged with a small RMS error, this is of no concern since the calibrated system will be consistent with the imaging geometry.

Proposition 2

The tilt homographies in (24) and (30) are consistent with each other, i.e., \({{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {p}} \rightarrow {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {o}}\) as \(d \rightarrow \infty \).

Proof

By substituting (22) into (24) and (30) and simplifying the trigonometric terms, we obtain

$$\begin{aligned} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {o}} = \left( \begin{array}{ccc} \frac{\displaystyle c_\rho ^2 c_\tau + s_\rho ^2}{\displaystyle c_\tau } &{} \frac{\displaystyle c_\rho s_\rho (c_\tau - 1)}{\displaystyle c_\tau } &{} 0 \\ \frac{\displaystyle c_\rho s_\rho (c_\tau - 1)}{\displaystyle c_\tau } &{} \frac{\displaystyle s_\rho ^2 c_\tau + c_\rho ^2}{\displaystyle c_\tau } &{} 0 \\ 0 &{} 0 &{} 1 \end{array} \right) \end{aligned}$$
(31)

and

$$\begin{aligned} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {p}} = \left( \begin{array}{ccc} c_\rho ^2 c_\tau + s_\rho ^2 &{} c_\rho s_\rho (c_\tau - 1) &{} 0 \\ c_\rho s_\rho (c_\tau - 1) &{} s_\rho ^2 c_\tau + c_\rho ^2 &{} 0 \\ s_\rho s_\tau / d &{} -c_\rho s_\tau / d &{} c_\tau \end{array} \right) . \end{aligned}$$
(32)

If we dehomogenize \({{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {p}}\) by dividing by \(c_\tau \), the result follows. \(\square \)

Remark 7

Lenses with \(d \ge 1000\) m can be regarded as telecentric in image space for all practical purposes.

Remark 8

It is trivial to show that in the perspective tilt camera model, c and d are related to the pupil magnification factor (2) by

$$\begin{aligned} m_{\mathrm {p}} = \frac{d}{c} . \end{aligned}$$
(33)

Proposition 3

If we denote the orthographic tilt homography in (31) as \({{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {o}}(\rho ,\tau )\) to make the parameters explicit, we have

$$\begin{aligned} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {o}}(\rho ,-\tau )= & {} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {o}}(\rho ,\tau ) \end{aligned}$$
(34)
$$\begin{aligned} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {o}}(\rho +\pi ,\tau )= & {} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {o}}(\rho ,\tau ) \end{aligned}$$
(35)
$$\begin{aligned} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {o}}(\rho +\pi ,\tau )= & {} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {o}}(\rho ,-\tau ) . \end{aligned}$$
(36)

Proof

This follows directly from (31). \(\square \)

Remark 9

Proposition 3 shows that we cannot determine the tilt of the image plane uniquely for lenses that are telecentric in image space, even if we require \(\tau > 0\). This is obvious from the geometry of the orthographic tilt homography. Note that this is the image space analog of the degeneracy described in Remark 5. The ambiguity can be resolved if we require \(0 \le \rho < \pi \). In practice, however, this would confuse the users, who typically would be surprised to have \(\rho \) reduced to this range if they specified a value of \(\rho \) outside this interval as the initial value (e.g., if the calibration returned that the lens is tilted right when the initial values specified that it is tilted left). Since the calibration will typically converge to the same half space in which the initial \(\rho \) was specified, we do not reduce \(\rho \) modulo \(\pi \).

Proposition 4

If we denote the perspective tilt homography in (32) as \({{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {o}}(\rho ,\tau ,d)\) to make the parameters explicit, we have

$$\begin{aligned} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {p}}(\rho ,-\tau ,d)= & {} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {p}}(\rho +\pi ,\tau ,d) \end{aligned}$$
(37)
$$\begin{aligned} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {p}}(\rho ,\tau ,-d)= & {} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {p}}(\rho ,-\tau ,d) \end{aligned}$$
(38)
$$\begin{aligned} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {p}}(\rho ,\tau ,-d)= & {} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {p}}(\rho +\pi ,\tau ,d) . \end{aligned}$$
(39)

Proof

This follows directly from (32). \(\square \)

Remark 10

Proposition 4 shows that there are no degeneracies in the perspective tilt homography if we require \(0 \le \tau< \pi /2, 0 \le \rho < 2\pi \), and \(d > 0\).

Proposition 5

If a lens that is perspective in image space is tilted around the horizontal or vertical axis, i.e., if \(\rho \in \{ 0, \pi /2, \pi , 3\pi /2 \}\), the values of \(\tau , d, s_x\), and \(s_y\) cannot be determined uniquely. This degeneracy can be resolved if the pixel aspect ratio is known.

Proof

We will only prove the case \(\rho = 0\). The other cases can be proved in an analogous manner. The calibration matrix \({{{\mathbf {\mathtt{{K}}}}}}\) in (14) can be split into a scaling and a translation part. Only the scaling part is relevant for this proof, i.e., we can set \(c_x = c_y = 0\). If we multiply \({{{\mathbf {\mathtt{{K}}}}}}(s_x,s_y)\) and \({{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {p}}(0,\tau ,d)\), we obtain

$$\begin{aligned} {{{\mathbf {\mathtt{{K}}}}}}(s_x,s_y) {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {p}}(0,\tau ,d) = \left( \begin{array}{ccc} \cos \tau / s_x &{} 0 &{} 0 \\ 0 &{} 1 / s_y &{} 0 \\ 0 &{} -\sin \tau / d &{} \cos \tau \end{array} \right) . \end{aligned}$$
(40)

Suppose one set of parameters \((s_{x,1}, s_{y,1}, \tau _1, d_1)\) is given. We want to find a different solution \((s_{x,2}, s_{y,2}, \tau _2, d_2)\). Since (40) represents a homogeneous matrix, we require

$$\begin{aligned} {{{\mathbf {\mathtt{{K}}}}}}(s_{x,1},s_{y,1}) {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {p}}(0,\tau _1,d_1) = s {{{\mathbf {\mathtt{{K}}}}}}(s_{x,2},s_{y,2}) {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {p}}(0,\tau _2,d_2) , \end{aligned}$$
(41)

where s is a scaling factor. This results in four equations:

$$\begin{aligned} \cos \tau _1 / s_{x,1}= & {} s \cos \tau _2 / s_{x,2} \end{aligned}$$
(42)
$$\begin{aligned} 1 / s_{y,1}= & {} s / s_{y,2} \end{aligned}$$
(43)
$$\begin{aligned} \sin \tau _1 / d_1= & {} s \sin \tau _2 / d_2 \end{aligned}$$
(44)
$$\begin{aligned} \cos \tau _1= & {} s \cos \tau _2 . \end{aligned}$$
(45)

We can select one of the four parameters on the right hand side of (41) arbitrarily, e.g., \(\tau _2\). We can then solve (45) for s:

$$\begin{aligned} s = \frac{\cos \tau _1}{\cos \tau _2} . \end{aligned}$$
(46)

Substituting \(\cos \tau _1\) from (45) into (42) and solving for \(s_{x,2}\) yields

$$\begin{aligned} s_{x,2} = s_{x,1} . \end{aligned}$$
(47)

Substituting s from (46) into (43) and solving for \(s_{y,2}\) yields

$$\begin{aligned} s_{y,2} = s_{y,1} \frac{\cos \tau _1}{\cos \tau _2} . \end{aligned}$$
(48)

Finally, substituting s into (44) and solving for \(d_2\) yields

$$\begin{aligned} d_2 = d_1 \frac{\tan \tau _2}{\tan \tau _1} . \end{aligned}$$
(49)

Therefore, we can find a valid solution for any chosen \(\tau _2\) by (47)–(49). From (48), we see that a unique solution can be enforced by requiring \(s_{y,2} = s_{y,1}\), i.e., by assuming that the aspect ratio of the pixels is known. \(\square \)

Remark 11

We have shown that Proposition 5 holds under the assumption that \(s_x\) and \(s_y\) can vary. As discussed in Remark 1, this implies that the degeneracy is present even if \(s_y\) is fixed since the effect of \(s_x\) and \(s_y\) can also be modeled by c or m and \(s_x\).

Remark 12

Instead of selecting \(\tau _2\), we could also have selected \(d_2\) and solved for the remaining parameters. In particular, we would then have obtained

$$\begin{aligned} \tau _2 = \arctan \biggl ( \frac{d_2}{d_1} \tan \tau _1 \biggr ) . \end{aligned}$$
(50)

If we regard \(d_1 = d\) as the true image plane distance and imagine that we force the image space ray angles to be identical to the object space ray angles by setting \(d_2 = c\), we can see that the tilt angle \(\tau _2\) will be incorrect if \(d \ne c\). For example, if d is actually three times as large as c and if \(\tau _1 = 5^\circ \), we have \(\tau _2 \approx 1.67^\circ \). From (48), it follows that \(s_{y,2}\) will be wrong by about \(-0.34\) %.

Remark 13

Proposition 5 and Remark 12 show that for all the perspective tilt camera models that force the ray angles in object and image space to be identical (see Sect. 5.1), the following properties hold if the ray angles are actually different:

  • If the tilt is in the vertical or horizontal direction and the approach calibrates the pixel aspect ratio, the calibration will converge with a low RMS error, i.e., the model will be consistent with the imaging geometry, but the tilt angle and the aspect ratio of the pixels will be wrong.

  • If the tilt is in the vertical or horizontal direction and the approach does not calibrate the pixel aspect ratio, the calibration will converge with a large RMS error, i.e., the model will be inconsistent with the imaging geometry, and at least the tilt angle will be wrong.

  • If the tilt is not in the vertical or horizontal direction, the calibration will converge with a large RMS error, i.e., the model will be inconsistent with the imaging geometry, and at least the tilt angle will be wrong.

Remark 14

Proposition 5 shows that the aspect ratio must be fixed in the calibration if the tilt is in the horizontal or vertical direction. This means that the user must exclude \(s_y\) from being calibrated if the tilt is close to the horizontal or vertical direction.

Remark 15

The degeneracy in Proposition 5 is the image space analog to the fact that a regular camera cannot be calibrated from a single image of a planar calibration object that is rotated around the horizontal or vertical image axis, even if the principal point is known, as shown by Sturm and Maybank (1999).

Proposition 6

For a lens that is telecentric in image space, the parameters c or \(m, \rho , \tau , s_x\), and \(s_y\) cannot be determined uniquely (even if \(s_y\) is fixed). This degeneracy can be resolved if \(s_x\) is known.

Proof

For tilts in the horizontal or vertical direction, it is obvious that \({{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {o}}\) simply scales the image by a factor of \(1/\cos \tau \) in the horizontal or vertical direction. Obviously, this can also be modeled by c or m and \(s_x\). Fixing \(s_x\) removes this degeneracy. For tilts in any other direction, the claim follows from Theorems 1 and 2, to be discussed in Sect. 8. \(\square \)

Remark 16

While the parameterization in (22) is very easy to understand for users, it has the disadvantage that \(\rho \) is undetermined if \(\tau =0\). We could simply set \(\rho =0\) in this case, but this does not work in the camera calibration, where both parameters must be optimized. Therefore, internally we use the Rodrigues parameterization (Lenz 1988, Chapter 10.2; Morawiec and Field 1996), also called Gibbs or Cayley parameterization (Bauchau and Trainelli 2003). Here, the rotation is parameterized by \({{\varvec{r}}} = (r_x, r_y, r_z)^\top = {{\varvec{n}}} \tan (\tau /2)\). In our case, \(r_z = 0\) and we have:

$$\begin{aligned} {{{\mathbf {\mathtt{{R}}}}}}_{\mathrm {t}}= & {} \frac{1}{1 + r_x^2 + r_y^2} \nonumber \\&\times \left( \begin{array}{ccc} 1 + r_x^2 - r_y^2 &{} 2 r_x r_y &{} 2 r_y \\ 2 r_x r_y &{} 1 - r_x^2 + r_y^2 &{} -2 r_x \\ -2 r_y &{} 2 r_x &{} 1 - r_x^2 - r_y^2 \end{array} \right) . \end{aligned}$$
(51)

The only singularity of the Rodrigues parameterization occurs for \(\tau = \pm \pi \), a case that is of no interest to us. We convert the initial values of \(\tau \) and \(\rho \) specified by the user to \({{\varvec{r}}}\) for the calibration and convert the calibrated \({{\varvec{r}}}\) back to \(\tau \) and \(\rho \) on output. The price we pay to avoid the singularity is that the user no longer can force the tilt to occur in a particular direction since the tilt direction and tilt angle are jointly modeled by \({{\varvec{r}}}\) and cannot be excluded separately from the optimization.

Remark 17

In the camera models we have proposed, the principal point is the intersection of the optical axis with the image plane. This point is sometimes called the autocollimation point in photogrammetry (Luhmann et al. 2014, Chapter 3.3.2.2).

8 Tilt Cameras in Projective Geometry

As discussed in Sects. 6 and 7, the proposed models for cameras with tilt lenses can be represented by matrices if there are no distortions. In this section, we are only interested in a single camera with a single pose. Therefore, we will denote the pose matrix by \({{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {w}}\). Thus, a perspective tilt lens camera is given by the camera matrix

$$\begin{aligned} {{{\mathbf {\mathtt{{P}}}}}}_{\mathrm {pp}} = {{{\mathbf {\mathtt{{K}}}}}} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {p}} {{{\mathbf {\mathtt{{P}}}}}}_{\mathrm {p}} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {w}} , \end{aligned}$$
(52)

where \({{{\mathbf {\mathtt{{K}}}}}}, {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {p}}, {{{\mathbf {\mathtt{{P}}}}}}_{\mathrm {p}}\), and \({{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {w}}\) are given by (20), (32), (16), and (4), respectively. An image-side telecentric tilt lens camera is given by

$$\begin{aligned} {{{\mathbf {\mathtt{{P}}}}}}_{\mathrm {po}} = {{{\mathbf {\mathtt{{K}}}}}} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {o}} {{{\mathbf {\mathtt{{P}}}}}}_{\mathrm {p}} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {w}} , \end{aligned}$$
(53)

where \({{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {o}}\) is given by (31). An object-side telecentric tilt lens camera is given by

$$\begin{aligned} {{{\mathbf {\mathtt{{P}}}}}}_{\mathrm {op}} = {{{\mathbf {\mathtt{{K}}}}}} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {p}} {{{\mathbf {\mathtt{{P}}}}}}_{\mathrm {o}} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {w}} , \end{aligned}$$
(54)

where \({{{\mathbf {\mathtt{{P}}}}}}_{\mathrm {o}}\) is given by (18). Finally, a bilateral telecentric tilt lens camera is given by

$$\begin{aligned} {{{\mathbf {\mathtt{{P}}}}}}_{\mathrm {oo}} = {{{\mathbf {\mathtt{{K}}}}}} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {o}} {{{\mathbf {\mathtt{{P}}}}}}_{\mathrm {o}} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {w}} . \end{aligned}$$
(55)

In this section, we will examine how tilt cameras are related to projective cameras, affine cameras, and general cameras at infinity (see Hartley and Zisserman 2003, Chapters 6.1–6.3). In particular, we examine the question whether these three camera types can be interpreted as cameras with tilt lenses. The proofs of the theorems we will present make use of the dual image of the absolute conic (DIAC) (Hartley and Zisserman 2003, Chapter 8.5), given by

$$\begin{aligned} \mathbf {\omega }^*= {{{\mathbf {\mathtt{{P}}}}}} {{{\mathbf {\mathtt{{Q}}}}}}_\infty ^* {{{\mathbf {\mathtt{{P}}}}}}^\top , \end{aligned}$$
(56)

where \({{{\mathbf {\mathtt{{Q}}}}}}_\infty ^* = {\mathrm {diag}}(1,1,1,0)\) is the canonical form of the absolute dual quadric (Hartley and Zisserman 2003, Chapter 3.7). (The function \({\mathrm {diag}}\) constructs a diagonal matrix with the specified elements.) This will allow us to remove the exterior orientation from the equations to be solved. If we denote the entries of \({{{\mathbf {\mathtt{{P}}}}}}\) by \(p_{ij}\), the elements \(\omega _{ij}\) of the DIAC \(\mathbf {\omega }^*\) are given by

$$\begin{aligned} \omega _{ij} = \sum _{k=1}^3 p_{ik} p_{jk} \end{aligned}$$
(57)

Note that \(\mathbf {\omega }^*\) is a symmetric matrix.

Theorem 1

Every affine camera can be regarded as a bilateral telecentric tilt camera with square pixels and principal point at \((0,0)^\top \).

Proof

See Appendix A.1. \(\square \)

Remark 18

This proves Proposition 6 for bilateral telecentric tilt cameras, even if the pixels are non-square and if there are distortions. Let us assume that the pixels are non-square and that the aspect ratio b of the pixels is known and fixed. This can be modeled by a calibration matrix of the form \({\mathrm {diag}}(b a, a, 1)\). We can multiply the camera matrix \({{{\mathbf {\mathtt{{M}}}}}}\) by \({\mathrm {diag}}(1/b, 1, 1)\) from the left to obtain a camera matrix with square pixels and then apply the approach given in the proof of Theorem 1. If there are distortions, Proposition 6 is true since the distortions are merely a transformation within the untilted image plane. Consequently, they do not alter the essential projection properties of the camera, i.e., for bilateral telecentric tilt lenses, the locations of the entrance and exit pupil at infinity are not altered by the distortions. Therefore, the exterior orientation and the pose of the image plane are invariant to distortions.

Theorem 2

Every finite projective camera can be regarded as an image-side telecentric tilt camera with square pixels.

Proof

See Appendix A.2. \(\square \)

Remark 19

This proves Proposition 6 for image-side telecentric tilt cameras, even if the pixels are non-square and if there are distortions. The argument is almost identical to Remark 18. The only modification is that we must multiply \({{{\mathbf {\mathtt{{K}}}}}}_{\mathrm {c}} {{{\mathbf {\mathtt{{M}}}}}}\) from the left by the matrix \({\mathrm {diag}}(1/b, 1, 1)\), where \({{{\mathbf {\mathtt{{K}}}}}}_{\mathrm {c}}\) is given by (103).

Theorem 3

Every general camera at infinity can be regarded as an object-side telecentric tilt camera.

Proof

See Appendix A.3. \(\square \)

Remark 20

The space of general cameras at infinity has ten degrees of freedom: twelve for the entries of the camera matrix less one for the arbitrary scale of the matrix less one for the constraint that the left \(3 \times 3\) submatrix is singular. To remove the scale factor, one can also normalize the camera matrix by \(\Vert {{{\mathbf {\mathtt{{M}}}}}} \Vert _\mathrm {F} = 1\) using the Frobenius norm. Therefore, the space of general cameras at infinity can also be regarded as a subspace (given by \(\det {{{\mathbf {\mathtt{{M}}}}}}_3 = 0\)) of the unit sphere \(S^{11}\) in \(\mathbb {R}^{12}\) with antipodes identified (the latter is isomorphic to the projective space \(\mathbb {P}^{11}(\mathbb {R})\)).

Theorem 3 establishes a minimal parameterization of the non-degenerate part of this space in terms of the parameters \(a_x, a_y, \tau , \rho , d, t_x, t_y\), and three rotation parameters using a set of maps that are parameterized by a finite set of values of \((c_x, c_y)^\top \). This shows that the space has dimension 10, except in its degenerate part.

Corollary 2

Every projective camera can be regarded as a tilt camera.

Proof

This follows from Theorems 13. Affine cameras can be regarded as bilateral telecentric tilt cameras, general cameras at infinity can be regarded as object-side telecentric tilt cameras, and finite projective cameras can be regarded as image-side telecentric tilt cameras.\(\square \)

9 Calibration

The camera calibration is performed using a planar calibration object. The advantages of planar calibration objects are that they are easier to manufacture accurately and that they can be handled more easily by the users. Furthermore, with planar calibration objects, it is very easy to define a plane in the world by simply placing the calibration object onto the plane of interest in the world. With this, it is possible to measure in world coordinates within the plane with a single camera, as described by Steger et al. (2008, Chapter 3.9.4). Another distinctive advantage of planar calibration objects is that they can be used conveniently in backlight applications if the calibration object is opaque and the control points are transparent (see, e.g., Steger et al. 2008, Chapter 4.7).

The calibration object uses a hexagonal layout of control points, as shown in Fig. 17. The hexagonal layout provides the highest density of control points of any layout. The calibration object has five finder patterns, indicated by the small dark circles within the control points, that facilitate the unique determination of the pose of the calibration object even if it is only partially visible in the image. This is advantageous since the calibration object can imaged such that it covers the entire field of view of the camera. This increases the accuracy with which the camera parameters, in particular, the distortion parameters, can be determined.

Fig. 17
figure 17

An example of the planar calibration object

Let the known 3D coordinates of the centers of the control points of the calibration object be denoted by \({{\varvec{p}}}_j\) (\(j = 1, \ldots , n_{\mathrm {m}}\)).Footnote 11 As described in Sect. 6, the user conceptually acquires \(n_{\mathrm {o}}\) images of the calibration object with each of the \(n_{\mathrm {c}}\) cameras. The calibration object, however, does not need to be visible in all images simultaneously. If the calibration object is outside the field of view of a particular camera for a particular pose, no image needs to be acquired. There simply will be no observation of the calibration object in this case, which will be modeled appropriately in the optimization. However, there must be a chain of observations of the calibration object in multiple cameras that connects all the cameras. As described in Sect. 6, the pose (exterior orientation) of the calibration object in the reference camera is denoted by \({{{\mathbf {\mathtt{{H}}}}}}_l\) (\(l = 1, \ldots , n_{\mathrm {o}}\)) and the relative orientation of the cameras with respect to the reference camera is denoted by \({{{\mathbf {\mathtt{{H}}}}}}_k\) (\(k = 1, \ldots , n_{\mathrm {c}}\)). Let us denote the corresponding parameters by the vectors \({{\varvec{e}}}_l\) and \({{\varvec{r}}}_k\). Let the interior orientation of camera k be denoted by the vector \({{\varvec{i}}}_k\). Furthermore, let the projection from world coordinates to the undistorted image plane, i.e., (3), (5), and, depending on the camera model, (6) or (8), be denoted by \(\pi _{\mathrm {u}}\). Let the projection from the image coordinate system back to the undistorted image plane, i.e., the inverse of (13), optionally the inverse of (24) or (30), depending on the camera model, and (10) or (12), depending on the camera model, be denoted by \(\pi _{\mathrm {i}}^{-1}\). In addition, let \(v_{jkl}\) denote a function that is 1 if the control point j of the observation l of the calibration object is visible with camera k, and 0 otherwise. The user effectively specifies the kl part of this function (albeit in a user-friendly manner) during the acquisition of the calibration images. The j part of this function is automatically determined during the extraction of the images of the control points, denoted by \({{\varvec{p}}}_{jkl}\), from the calibration images.

To calibrate an arbitrary combination of cameras, the following function is minimized:

$$\begin{aligned} \varepsilon ^2 = \sum _{l=1}^{n_{\mathrm {o}}} \sum _{k=1}^{n_{\mathrm {c}}} \sum _{j=1}^{n_{\mathrm {m}}} v_{jkl} \Vert \pi _{\mathrm {i}}^{-1}( {{\varvec{p}}}_{jkl}, {{\varvec{i}}}_{k}) - \pi _{\mathrm {u}}( {{\varvec{p}}}_j, {{\varvec{e}}}_l, {{\varvec{r}}}_k, {{\varvec{i}}}_{k}) \Vert _2^2 . \end{aligned}$$
(58)

The minimization is performed by a suitable version of the sparse Levenberg–Marquardt algorithms described in (Hartley and Zisserman 2003, Appendix A6).Footnote 12

The points \({{\varvec{p}}}_{jkl}\) are extracted by fitting ellipses (Fitzgibbon et al. 1999) to edges extracted with a subpixel-accurate edge extractor (Steger 1998, Chapter 3.3; Steger 2000). As discussed in Sect. 5.2, this causes a bias in the point positions. Section 10 will explain how this bias can be removed.

The Levenberg–Marquardt algorithm requires initial values for the camera parameters. One advantage of the camera model proposed in Sects. 6 and 7 is that the initial values for the interior orientation parameters can be directly obtained from the specifications of the camera and lens. For example, the initial values of \(s_x\) and \(s_y\) can be read off from the data sheet of the camera since they are the pixel size of the sensor elements. The initial value of c or m can be obtained from the lens barrel or from the data sheet of the lens.Footnote 13 Some lens manufacturers provide Gaussian optics data for their lenses. If the lens manufacturer provides the distance of the exit pupil from the image plane, that value can be used as the initial value for d. If not, for perspective lenses the user can roughly estimate the pupil magnification factor \(m_{\mathrm {p}}\) by eye by looking at the lens from the front and back and then use \(d = m_{\mathrm {p}} c\) as the initial value for perspective lenses. For object-side telecentric lenses, the initial value for d typically corresponds roughly to the flange focal distance (for example, in the order of 20–50 mm for a C mount camera, for which the flange focal distance is 17.526 mm). Initial values for \(\tau \) and \(\rho \) are typically known from the considerations that led to the use of the tilt lens in the first place, i.e., from the Scheimpflug principle and Eq. (1). The initial value of \((c_x,c_y)^\top \) can be set to the center of the image and the initial values for the distortion parameters can be set to 0. In our experience, the choice of the initial parameters is not particularly critical as long as the initial value is within a relatively large factor, typically more than 5, of the actual value. Therefore, for example, the initial value of d for perspective tilt lenses can usually be set to the initial value of c.

One advantage of using circular marks is that initial parameters for the exterior orientation of the calibration object in each camera can be directly inferred form the shapes of the ellipses in the images (Lanser 1997, Chapter 3.2.1.2; Lanser et al. 1995). Initial values for the relative orientations of the cameras can be computed based on the initial values for the exterior orientations.

The calibration algorithm allows to exclude any parameter of the interior, exterior, and relative orientation globally or individually for each camera. The sole exception is that the tilt angles \(\rho \) and \(\tau \) can only be excluded jointly. This is a result of the implementation choice that internally the Rodrigues parameterization is used to represent the tilt angles (see Remark 16). The parameter exclusion mechanism is used to handle all the generic degeneracies described in Sects. 6.2 and 7.2 automatically. For example, \(s_x\) is excluded automatically if the user selects a bilateral or image-side telecentric tilt lens (see Proposition 6). Degeneracies that occur only for certain imaging geometries (e.g., the degeneracy described in Proposition 5 and Remarks 11, 12, and 14) must be handled by the user.

The proposed calibration algorithm provides some features that are novel and useful for the users. First of all, an arbitrary combination of perspective and telecentric cameras can be calibrated simultaneously, including their relative orientation. This facilitates a multi-view stereo reconstruction using an arbitrary combination of cameras. We will discuss this aspect in more detail in Sect. 12. Second, the formulation in (58) of minimizing the error in the undistorted image plane allows an arbitrary mixture of distortion models to be calibrated efficiently. In contrast, the typical projection to the image coordinate system would mean that the polynomial distortion model would have to be evaluated by a numerical root finding algorithm, which is significantly less efficient. Finally, the fact that the layout of the calibration object was designed so that it can fill the entire field of view and to maximize the density of the control points means that fewer calibration images must be acquired by the user to achieve the desired accuracy.

10 Bias Removal

In this section, we will describe two approaches to remove the control point bias.

The first approach is an extension of the approach by Heikkilä (2000). As discussed in Sect. 5.2, Heikkilä’s approach does not consider distortion bias, which means that it does not fully remove the bias. Therefore, we extend his approach to correctly handle the distortion bias. The core idea to achieve this was already described by Lanser (1997, Chapter 3.2.1).Footnote 14 To remove the distortion bias, we transform the extracted subpixel-accurate control point contours to the undistorted image plane with the transformation \(\pi _{\mathrm {i}}^{-1}\) (see Sect. 9) using the current set of interior orientation parameters. Ellipses are then fitted to the undistorted control point contours. Afterwards, the bias correction computed by the algorithm of Heikkilä (2000) is applied to the results of the ellipse fitting.Footnote 15 The complete algorithm first calibrates the cameras using the biased control point positions, applies the bias correction, and calibrates the cameras anew using the unbiased control point positions. The second step could be iterated. However, this typically results in insignificant improvements of the camera parameters, and therefore is not done in our implementation.

The above algorithm provides very accurate results, as the experiments below show. However, it does not work for line-scan cameras since for these cameras there is no undistorted image plane to which the contours could be rectified.Footnote 16 The second approach we propose can also be used for line-scan cameras. It basically is an efficient version of the approaches proposed by Datta et al. (2009) and Vo et al. (2011). To remove the bias, we transform the control point contours back to the plane of the calibration object with the approach described by Steger et al. (2008, Chapter 3.9.4) using the current estimates of the interior orientation of the camera and the exterior orientation of the calibration object. Then, circles are fitted to the rectified contours. Finally, the centers of the fitted circles are projected to the undistorted image plane or the image coordinate system, as required. The camera is first calibrated using the biased control point positions, the bias is removed, and the camera is calibrated anew. The results of this approach are equally accurate as those of the first approach. For historical reasons, we use the first approach for area-scan cameras and the second approach for line-scan cameras.

To examine the potential accuracy improvement that can be achieved by the bias removal, we constructed a synthetic example in which we tried to maximize the perspective and distortion bias. We simulated a 20 megapixel camera with the same sensor specifications as a Canon EOS 6D and a very wide-angle 17 mm tilt/shift lens. To maximize the overall bias, we simulated a very large pincushion distortion. To maximize the perspective bias, 16 calibration images in which the calibration object was tilted by up to \(45^\circ \) and covered a large part of the field of view were simulated with an algorithm that assumes a perfect camera (no noise, 100 % fill factor, linear gray value response; see Steger et al. 2008, Chapter 3.7.4). This resulted in a diameter of the control points between 60 and 330 pixels in the calibration images, a perspective bias of up to 2.5 pixels (standard deviation (SD) \(\approx 0.2\) pixels), a distortion bias of up to 4.5 pixels (SD \(\approx 0.4\) pixels), and an overall bias of up to 6 pixels (SD \(\approx 0.5\) pixels). As can be seen from the standard deviations, despite the effort to maximize the bias, for most control points the bias is less than 1 pixel.

Since wide-angle lenses typically exhibit a barrel distortion, we also simulated this kind of distortion. This resulted in a perspective bias of up to 1.5 pixels (SD \(\approx 0.15\) pixels),Footnote 17 a distortion bias of up to 1.1 pixels (SD \(\approx 0.1\) pixels), and an overall bias of up to 0.7 pixels (SD \(\approx 0.1\) pixels). As can be seen, the distortion bias of barrel distortions cancels the perspective bias to some extent. Therefore, it is important to note that barrel distortions do not simulate the worst case.

Table 1 Results of the bias removal

The ground truth camera parameters and the results of the calibration without and with bias removal are displayed in Table 1. In addition to the interior orientation parameters, Table 1 also lists the maximum translation and angle errors of the exterior orientations of the calibration object. They were calculated by inverting the calibrated exterior orientations and composing them with the true exterior orientations. This allows us to measure the errors in the coordinate system of the calibration object. The translation error is the length of the translation of the combined pose. To compute the angle error, the rotation of the combined pose is converted into an angle/axis representation and the absolute value of the angle is used as the angle error.

As can be seen from Table 1, the bias removal reduces the relative errors by 2–4 orders of magnitude in this example. A second iteration of the bias removal would only reduce the errors further by a factor of 2–3 and therefore is not worthwhile. We also note that the errors in the biased results are 1–2 orders of magnitude smaller if a barrel distortion of the same magnitude is used, while the errors in the unbiased results are up to one order of magnitude smaller.

The proposed approach to remove the bias is computationally very efficient. The calibration without bias removal requires 30 ms (excluding the time necessary to extract the control points in the calibration images). Including the bias removal increases the runtime by 700 ms. The transformation of the control point contours requires approximately 11 ms per image. In contrast, the approaches by Datta et al. (2009) and Vo et al. (2011) would necessitate transforming the calibration images, which would require approximately 650 ms per image in our software. In addition, the control point extraction would have to be executed again for the rectified images. This would require approximately 300 ms per image in our software. Therefore, the proposed approach is more than 20 times more efficient than the approaches by Datta et al. (2009) and Vo et al. (2011).

11 Experiments

The proposed camera models have been evaluated extensively on numerous lenses. Standard SLR lenses were attached to a Novoflex Balpro T/S bellows, equipped with Nikon F-mount lens and camera adapters. This setup allows us to turn any Nikon SLR lens into a tilt/shift lens. However, since the minimum extension of the bellows that facilitates a tilt of approximately \(5^\circ \) is about 3 cm, the bellows acts like an extension tube. Thus, for lenses with a short focal length, the minimum object distance becomes impractically small, which means that this setup only works for lenses with a sufficiently large focal length (larger than about 70 mm). Furthermore, we tested many commercially available SLR tilt/shift lenses. In addition, we constructed a special camera housing with a bellows that enabled us to attach most C-mount machine vision lenses to the housing and to be able to tilt the lenses around the vertical axis. In this section, we will present the most salient evaluation results.

Figure 18 shows a Sigma 150 mm F2.8 macro lens and a Nikon D3 camera attached to the Novoflex Balpro T/S bellows. The focus was set to a distance close to the minimum object distance of the lens. The lens has a focusing mechanism where the entrance pupil recedes within the lens as the focus is adjusted from infinity to the minimum object distance, whereas the exit pupil’s size grows and its position remains approximately unchanged. This suggests that the lens has a principal distance that differs significantly between the infinity and near focus settings. At the chosen focus setting, the entrance pupil appears significantly larger than the exit pupil. Table 2 displays the calibration results using this setup.

Fig. 18
figure 18

The experimental setup used to obtain the results in Table 2. A Sigma 150 mm F2.8 macro lens and a Nikon D3 camera were attached to the Novoflex Balpro T/S bellows. The image also shows the calibration object

To obtain a baseline for the RMS error, an experiment with no tilt (\(\tau \approx 0\)) was performed. With the proposed camera model, this resulted in an RMS error of 0.12552. As described in Remark 6, d cannot be determined reliably if \(\tau \approx 0\). This is why d has converged to an essentially arbitrary value in this case. Table 2 shows that, with the proposed perspective tilt camera model, the RMS error only increases slightly if the camera is tilted by an angle of \(\tau \approx 6^\circ \) around the axes \(\rho \approx 90^\circ \) and \(\rho \approx 135^\circ \). The calibration results reflect that the entrance and exit pupils have different sizes. The image plane distance d is significantly smaller than the principal distance c. By (33), the pupil magnification factor of this lens at this particular focus setting is \(\approx \)0.3. Note that the principal distance c is much larger than the nominal 150 mm because of the focusing mechanism of the lens and because the bellows acts as an extension tube.

The middle and lower parts of Table 2 display the results of calibrating the camera with a tilt camera model that enforces equal ray angles in object and image space. In the middle part, \(s_x\) is fixed (enforcing square pixels), while \(s_x\) is optimized in the lower part. As predicted by Remark 13, if \(s_x\) is fixed, a large RMS error results if the lens is tilted in any direction. If \(s_x\) is optimized, the RMS error for the tilt around the vertical axis is much smaller (albeit not as small as for the proposed tilt camera model because the lens has distortions), while the RMS error of the diagonal tilt remains large. In both cases, a tilt angle \(\tau \) that deviates significantly from the nominal value is returned.

As a further example for SLR lenses, a Nikon AF-S Micro-Nikkor 105 mm f/2.8G was calibrated with a near focus setting, resulting in a pupil magnification factor of \(\approx \)0.73. Furthermore, two standard machine vision lenses were tested. For a Cosmicar/Pentax C1614-M 16 mm lens, a pupil magnification factor of \(\approx \)3 was obtained, while for a Cosmicar B1214D-2 12.5 mm it was \(\approx \)2.7. All of these SLR and machine vision lenses cannot be modeled correctly by camera models that force the ray angles in object and image space to be identical. A third machine vision lens, a Schneider Kreuznach BK-Makro CPN-S 50 mm lens, has a pupil magnification factor so close to 1 that it could be calibrated correctly with a traditional tilt camera model.

Table 2 Calibration results with a Nikon D3 camera and a Sigma 150 mm F2.8 macro lens attached to a Novoflex Balpro T/S bellows
Table 3 Calibration results with a Nikon D3 camera and a Nikon PC-E Nikkor 24 mm f/3.5D tilt/shift lens

When a Nikon PC-E Nikkor 24 mm f/3.5D tilt/shift lens (the lens that is shown in the right image of Fig. 5) was mounted on a Nikon D3 camera and calibrated with the perspective tilt lens model, the image plane distance d consistently converged to values \(\ge \) \(10^{60}\) m. Therefore, the lens was calibrated with the image-side telecentric tilt lens model. The tilt angle was adjusted to \(\approx \) \(0^\circ \) and \(\approx \) \(6^\circ \) using the tilt scale on the lens. The results are displayed in Table 3. From the top part of the table, we can see that the RMS errors increase somewhat if the lens is tilted. Nevertheless, the tilt angles are very consistent with the nominal tilt angles.

The middle and lower parts of Table 3 display the results of calibrating the lens with a camera model that enforces equal ray angles in object and image space. Again, as predicted by Remark 13, the RMS errors are large if the lens is tilted and \(s_x\) is fixed, while the error for the tilt around the vertical axis is smaller if \(s_x\) is optimized. As before, the tilt angles deviate significantly from their nominal values.

The fact that the Nikon 24 mm tilt/shift lens is image-side telecentric piqued our interest and we calibrated further SLR tilt/shift lenses. It turned out that the Nikon PC-E Micro Nikkor 45 mm f/2.8D, the Nikon PC-E Micro Nikkor 85 mm f/2.8D, the Canon TS-E 17 mm f/4L, and the Canon TS-E 24 mm f/3.5L II lenses all are telecentric in image space.Footnote 18 Furthermore, a calibration of the standard machine vision lens Pentax C815B showed that this lens, too, is image-side telecentric. Again, none of these lenses can be calibrated correctly by a tilt camera model that forces rays to have the same angle in object and image space.

As a final example, Table 4 displays the results of calibrating two object-side telecentric lenses. The lenses were mounted onto the special camera housing described at the beginning of this section. Apart from facilitating tilts around the vertical axis, the camera housing also allows the position of the image plane to be adjusted, thereby facilitating a focusing of the lens to object planes with different distances.Footnote 19 The image plane was adjusted to a slightly larger distance than the nominal 17.526 mm of a standard C-mount lens to allow a large tilt of the lenses. Note that this increases the magnification of the lenses slightly. The camera uses an IDS uEye UI-1222LE sensor. The lenses were tilted around the vertical axis by the maximum angle the camera housing allowed (\(\approx 14^\circ \) for both experiments; because of different image plane positions, the angle differed between the experiments). The Vicotar T201/0.19 lens has a nominal magnification of 0.19. The nominal magnification of the V.S. Technologies L-VS-TC017 lens is 0.17. The calibrated tilt angles are close to the expected values. Furthermore, as expected, the calibrated magnifications are slightly larger than the nominal values. In this experiment, the division model was sufficient to model the lenses accurately.

Table 4 Calibration results with two object-side telecentric lenses: a Vicotar T201/0.19 and a V.S. Technologies L-VS-TC017 lens

12 Rectification of Stereo Images of a Perspective and a Telecentric Camera

With the results of the calibration approach discussed in Sects. 9 and 10, a stereo reconstruction can be performed. Stereo algorithms require that the images are rectified in such a manner that the epipolar lines in the rectified images are horizontal and at the same row coordinate. For pairs of perspective and and pairs of telecentric cameras,Footnote 20 the geometry of the stereo rectification is well understood (Hartley and Zisserman 2003, Chapter 11.12). For a stereo image pair of a perspective and a telecentric camera, we could, in principle, use any of the homography-based algorithms that are capable of handling this configuration correctly, e.g., (Gluckman and Nayar 2001). However, these approaches are purely based on 2D projective transformations, which gives us no insight about the 3D geometry of the problem. We will show below that the rectification of a perspective and a telecentric stereo image pair will lead to a kind of object-side telecentric lens that we have not considered so far.

Fig. 19
figure 19

The geometry of rectifying a stereo image pair of a perspective and a telecentric camera. The perspective camera is described by its projection center \({{\varvec{o}}}_1\), its optical axis \({{\varvec{a}}}_1\), its image plane \({{\varvec{i}}}_1\), its principal point \({{\varvec{p}}}_1\), its principal distance c, and its camera coordinate system, of which only the \({{\varvec{z}}}_1\) axis is shown. The telecentric camera is described by its projection center \({{\varvec{o}}}_2\), lying in the plane at infinity \(\pi _\infty \), its optical axis \({{\varvec{a}}}_2\), its image plane \({{\varvec{i}}}_2\), its principal point \({{\varvec{p}}}_2\), and its camera coordinate system, of which only the \({{\varvec{z}}}_2\) axis is shown. As described in Remark 4, the origin of the camera coordinate system lies at a finite location. The relative orientation of the telecentric camera with respect to the projective camera is given by \({{{\mathbf {\mathtt{{R}}}}}}_{\mathrm {r}}\) and \({{\varvec{t}}}_{\mathrm {r}}\). The base of the stereo system is the line \({{\varvec{b}}}\) that connects \({{\varvec{o}}}_1\) and \({{\varvec{o}}}_2\). It intersects \({{\varvec{i}}}_2\) in the epipole \({{\varvec{e}}}_2\). The base \({{\varvec{b}}}\) is parallel to \({{\varvec{a}}}_2\). The rectifying image plane is parallel to \({{\varvec{b}}}\) and to \({{\varvec{a}}}_2\). Its normal is given by \({{\varvec{z}}}'_1\). The rectifying image plane, \({{\varvec{a}}}_2\), and \({{\varvec{b}}}\) all intersect in \({{\varvec{o}}}_2\). The rectified image of \({{\varvec{i}}}_1\) is \({{\varvec{i}}}'_1\), that of \({{\varvec{i}}}_2\) is \({{\varvec{i}}}'_2\). The rectification of the perspective camera image is performed by a standard rotation of the image plane around the projection center \({{\varvec{o}}}_1\) and, possibly, a change of the principal distance. The new principal point is denoted by \({{\varvec{p}}}'_1\). To rectify \({{\varvec{i}}}_2\), a perspective projection of \({{\varvec{i}}}_2\) onto \({{\varvec{i}}}'_2\) must be performed. Its projection center \({{\varvec{o}}}'_2\) must lie on b. The distance d of \({{\varvec{o}}}'_2\) from \({{\varvec{e}}}_2\) can be chosen arbitrarily. The principal point of \({{\varvec{i}}}'_2\) is given by \({{\varvec{p}}}'_2\). The distance t of \({{\varvec{i}}}'_1\) and \({{\varvec{i}}}'_2\) from \({{\varvec{b}}}\) can be chosen arbitrarily. Note that the rectified telecentric camera with principal point \({{\varvec{e}}}_2\), virtual image plane \({{\varvec{i}}}_2\), and image plane \({{\varvec{i}}}'_2\) is an object-side telecentric camera with an image plane tilted by \(90^\circ \) and shifted by t

The stereo geometry of a perspective and a telecentric camera is shown in Fig. 19. Without loss of generality, we can assume that the perspective camera is the first camera and that it lies to the left of the second, telecentric, camera. In principle, the stereo geometry is analogous to that of two perspective cameras. The base \({{\varvec{b}}}\) connects the two projection centers \({{\varvec{o}}}_1\) and \({{\varvec{o}}}_2\). The difference is that \({{\varvec{o}}}_2\) lies in the plane at infinity \(\pi _\infty \). Therefore, \({{\varvec{b}}}\) is parallel to the optical axis \({{\varvec{a}}}_2\) of the telecentric camera. The two images must be projected onto a common image plane that is parallel to \({{\varvec{b}}}\). Consequently, the rectified image plane, the base \({{\varvec{b}}}\), and the optical axis \({{\varvec{a}}}_2\) all intersect in \({{\varvec{o}}}_2\). The orientation of the rectified image plane has one degree of freedom: it can be rotated around \({{\varvec{b}}}\). We select the normal of the rectified image plane to point from \({{\varvec{o}}}_1\) to the point on \({{\varvec{a}}}_2\) that has the shortest distance from \({{\varvec{o}}}_1\). This results in smaller perspective distortions than other choices of the normal.

The rectification of the perspective camera is the usual rotation of the camera around \({{\varvec{o}}}_1\) and, possibly, a change of the principal distance. The new viewing direction of the rectified camera is \({{\varvec{z}}}'_1\), which is also the normal of the rectified image plane. The axis \({{\varvec{x}}}'_1\) of the rectified camera coordinate system of the perspective camera (not shown in Fig. 19) must be parallel to \({{\varvec{b}}}\). The axis \({{\varvec{y}}}'_1\) must be chosen as \({{\varvec{y}}}'_1 = {{\varvec{z}}}'_1 \times {{\varvec{x}}}'_1\).

The rectification of the telecentric camera is more interesting. We must perspectively project the image \({{\varvec{i}}}_2\) onto the rectified image plane, resulting in the image \({{\varvec{i}}}'_2\). Note that \({{\varvec{i}}}'_2\) is orthogonal to \({{\varvec{i}}}_2\), i.e., is rotated by \(90^\circ \) with respect to \({{\varvec{i}}}_2\). Clearly, the perspective center \({{\varvec{o}}}'_2\) of this projection must lie on the base \({{\varvec{b}}}\) for the epipolar lines to become parallel and horizontal.Footnote 21 The intersection of \({{\varvec{b}}}\) with \({{\varvec{i}}}_2\) is the epipole \({{\varvec{e}}}_2\). The distance d of \({{\varvec{o}}}'_2\) from \({{\varvec{e}}}_2\) is a parameter that can be chosen freely. Small values of d lead to narrow rectified images, large values of d to wide images. We choose d such that \({{\varvec{i}}}'_2\) has the same width as \({{\varvec{i}}}'_1\) to cause the perspective distortions in both rectified images to be approximately equal. The distance t of the rectified images planes to the base is another free parameter. We simply set it to c, i.e., to the principal distance of the perspective camera. Other choices of \(t > 0\) are possible, but would simply scale the rectified images. Note that the rectified telecentric camera with principal point \({{\varvec{e}}}_2\), virtual image plane \({{\varvec{i}}}_2\), and image plane \({{\varvec{i}}}'_2\) is an object-side telecentric camera with an image plane tilted by \(90^\circ \) and shifted by t. Its optical axis \({{\varvec{b}}}\) is shifted from the optical axis of the original telecentric camera by the vector \({{\varvec{e}}}_2 - {{\varvec{p}}}_2\), where \({{\varvec{p}}}_2\) is the principal point of the original camera.

The camera model of the rectified telecentric camera cannot be represented by the model for object-side telecentric tilt cameras described in Sect. 7. The model of Sect. 7 describes the tilted image plane by three parameters in an affine parameterization. This is analogous to describing the tilted image plane by \(z = a x + b y + d\). Obviously, this parameterization can represent all planes that are not perpendicular to the plane \(z = 0\), while it cannot represent planes that are perpendicular to \(z = 0\). Therefore, we must extend the model for tilt homographies to handle image planes that are tilted by \(90^\circ \) and shifted by t with respect to the original principal point.Footnote 22

Fig. 20
figure 20

The projection of a point \({{\varvec{p}}}_{\mathrm {u}}\) from the untilted image plane to a point \({{\varvec{p}}}_{\mathrm {t}}\) in an image plane that is tilted by \(90^\circ \). The camera is perspective in image space. The coordinate system of the untilted image plane is given by \(({{\varvec{x}}}_{\mathrm {u}}, {{\varvec{y}}}_{\mathrm {u}}, {{\varvec{z}}}_{\mathrm {u}})\), that of the tilted image plane by \(({{\varvec{x}}}_{\mathrm {t}}, {{\varvec{y}}}_{\mathrm {t}}, {{\varvec{z}}}_{\mathrm {t}})\). The untilted camera’s viewing direction is along the \({{\varvec{z}}}_{\mathrm {u}}\) axis, which points towards the scene. The vector \({{\varvec{n}}}\), which forms an angle \(\rho \) with \({{\varvec{x}}}_{\mathrm {u}}\), denotes the direction along which the two image planes intersect. The tilted image plane is shifted by the distance t in the direction perpendicular to \({{\varvec{n}}}\). These two coordinate systems can also be attached to the image-side projection center as \(({{\varvec{x}}}'_{\mathrm {u}}, {{\varvec{y}}}'_{\mathrm {u}}, {{\varvec{z}}}'_{\mathrm {u}})\) and \(({{\varvec{x}}}'_{\mathrm {t}}, {{\varvec{y}}}'_{\mathrm {t}}, {{\varvec{z}}}'_{\mathrm {t}})\). The distance from the projection center to the intersection of the optical axis with the untilted image plane is d

The geometry of projecting a point from an untilted image plane to an image plane that is tilted by \(90^\circ \) is shown in Fig. 20. Unlike for tilts smaller than \(90^\circ \), the origin of the tilted coordinate system \(({{\varvec{x}}}_{\mathrm {t}}, {{\varvec{y}}}_{\mathrm {t}}, {{\varvec{z}}}_{\mathrm {t}})\) does not coincide with that of the untilted coordinate system \(({{\varvec{x}}}_{\mathrm {u}}, {{\varvec{y}}}_{\mathrm {u}}, {{\varvec{z}}}_{\mathrm {u}})\) because the optical axis intersects the tilted image plane at infinity. Therefore, we use the traditional principal point as the reference point of the tilted image plane. The geometry is equivalent to a camera that rotates around the projection center and changes its principal distance from d to t. Thus, the projection can be modeled by the homography

$$\begin{aligned} {{{\mathbf {\mathtt{{H}}}}}}_{90^\circ } = {{{\mathbf {\mathtt{{K}}}}}}_{\mathrm {t}} {{{\mathbf {\mathtt{{R}}}}}}^\top {{{\mathbf {\mathtt{{K}}}}}}_{\mathrm {u}}^{-1} , \end{aligned}$$
(59)

where \({{{\mathbf {\mathtt{{K}}}}}}_{\mathrm {u}}\) is given by (26) and \({{{\mathbf {\mathtt{{K}}}}}}_{\mathrm {t}}\) is given by

$$\begin{aligned} {{{\mathbf {\mathtt{{K}}}}}}_{\mathrm {t}} = \left( \begin{array}{ccc} t &{} 0 &{} 0 \\ 0 &{} t &{} 0 \\ 0 &{} 0 &{} 1 \end{array} \right) . \end{aligned}$$
(60)

The rotation matrix \({{{\mathbf {\mathtt{{R}}}}}}\) is given by expressing \(({{\varvec{x}}}_{\mathrm {t}}, {{\varvec{y}}}_{\mathrm {t}}, {{\varvec{z}}}_{\mathrm {t}})\) in terms of \(({{\varvec{x}}}_{\mathrm {u}}, {{\varvec{y}}}_{\mathrm {u}}, {{\varvec{z}}}_{\mathrm {u}})\). Since \({{\varvec{z}}}_{\mathrm {u}}\) is the base of the stereo system, we obviously must choose \({{\varvec{x}}}_{\mathrm {t}}\) parallel to \({{\varvec{z}}}_{\mathrm {u}}\). One choice is to set \({{\varvec{x}}}_{\mathrm {t}} = {{\varvec{z}}}_{\mathrm {u}} = (0, 0, 1)^\top \). As shown in Fig. 20, we can set \({{\varvec{y}}}_{\mathrm {t}} = {{\varvec{n}}}\), where \({{\varvec{n}}}\) is given by (21). Finally, we can set \({{\varvec{z}}}_{\mathrm {t}} = {{\varvec{n}}}^\perp = (-\sin \rho , \cos \rho , 0)^\top \). A second choice is given by \({{\varvec{x}}}_{\mathrm {t}} = -{{\varvec{z}}}_{\mathrm {u}}, {{\varvec{y}}}_{\mathrm {t}} = -{{\varvec{n}}}\), and \({{\varvec{z}}}_{\mathrm {t}} = {{\varvec{n}}}^\perp \). Substituting the two choices into (59), we obtain two possible homographies:

$$\begin{aligned} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {90^\circ }} = \left( \begin{array}{ccc} 0 &{} 0 &{} t \\ \frac{t}{d} \cos \rho &{} \frac{t}{d} \sin \rho &{} 0 \\ -\frac{1}{d} \sin \rho &{} \frac{1}{d} \cos \rho &{} 0 \end{array} \right) \end{aligned}$$
(61)

and

$$\begin{aligned} {{{\mathbf {\mathtt{{H}}}}}}_{\mathrm {90^\circ }} = \left( \begin{array}{ccc} 0 &{} 0 &{} -t \\ -\frac{t}{d} \cos \rho &{} -\frac{t}{d} \sin \rho &{} 0 \\ -\frac{1}{d} \sin \rho &{} \frac{1}{d} \cos \rho &{} 0 \end{array} \right) \end{aligned}$$
(62)

The appropriate choice is (61) if the telecentric camera is to the left of the perspective camera and (62) otherwise. We also note that (62), for \(t < 0\), is identical to (61), for \(t > 0\), and vice versa. Therefore, we only need either (61) or (62) and can select the appropriate solution via the sign of t. The vector \({{\varvec{n}}}\) is given by the projection of \({{\varvec{z}}}'_1\) into the undistorted image plane of the telecentric camera, rotated by \(-\pi /2\).

With this, a point in the image plane \({{\varvec{i}}}_2\) is projected into the rectified image plane \({{\varvec{i}}}'_2\) by the following transformations. First, the point is transformed into the undistorted image plane of the telecentric camera by the inverse of (13), optionally the inverse of (24) or (30), depending on the camera model, and (10) or (12), depending on the distortion model. Next, the point is translated by the offset of the epipole \({{\varvec{e}}}_2\). Let the coordinates of the epipole in the undistorted image plane be denoted by \((e_x, e_y)^\top \). Since the coordinates of the principal point \({{\varvec{p}}}_2\) in the undistorted image plane are \((0,0)^\top \), the translation vector is \((-e_x, -e_y)^\top \). The translated point is then projected into the rectified image plane by (61) or (62). Finally, the projected point is transformed into the image coordinate system of the rectified image by (13). To transform points from \({{\varvec{i}}}'_2\) into \({{\varvec{i}}}_2\), which is required for image rectification, the inverses of the above transformations must be applied in the reverse order.

There are some free parameters in the above transformations that still must be determined. First, to prevent the rectified images from becoming too wide, we select \(s_x\) and \(s_y\) such that the pixels in the rectified images are approximately the same size as the pixels in the original images. This is done by projecting a square of size 1 in the center of both rectified images back to the original images and using the inverse of the average of the resulting side lengths of the transformed squares as the scaling factors for \(s_x\) and \(s_y\), respectively. Next, we select \(c_y\) of the two cameras equal based on the intersection of the bounding boxes of the two rectified images to ensure that the epipolar lines have the same row coordinate. Finally, the values of \(c_x\) are determined based on the individual bounding boxes of the two rectified images.

A synthetic example for the rectification of a stereo image pair of a perspective and a telecentric camera is shown in Fig. 21. The cameras are tilted by an angle of \(45^\circ \) with respect to each other. To show the epipolar lines for selected points before and after rectification, we use simulated images of a calibration object that we used previously (Steger et al. 2008, Chapter 3.9). This calibration object has fewer control points than the calibration object shown in Fig. 17, which allows us to display the epipolar lines corresponding to the centers of the control points. The images were created with synthetic camera parameters, i.e., the images in Fig. 21 are not used for calibration. They merely serve to have clearly identifiable points (the centers of the control points) for which the epipolar lines can be displayed. Figure 21 shows that the epipolar lines of the original images are slanted and those of the rectified images are horizontal and at the same row coordinates, illustrating that the rectification algorithm described above works correctly.

Fig. 21
figure 21

A synthetic example for the rectification of a stereo image pair of a perspective and a telecentric camera. The image of the perspective camera, along with the epipolar lines through the centers of the control points, is shown in a, the image of the telecentric camera is shown in b. The corresponding rectified images and epipolar lines are shown in c, d

Fig. 22
figure 22

A real example for the rectification of a stereo image pair of a telecentric and a perspective camera and for the corresponding 3D reconstruction. a Image of the telecentric camera, which looks perpendicularly at a PCB. b Image of the perspective tilt camera, which looks at the PCB at an angle of \(\approx \) \(37^\circ \). c Rectified telecentric image. d Rectified perspective image. e Computed stereo disparities. f Visualization of the computed metric 3D reconstruction. For better visibility, the reconstructed surface has been colored according to the z value of the reconstruction in the camera coordinate system of the telecentric camera. Darker surface colors correspond to points that are closer to the telecentric camera

A real example for the rectification of a stereo image pair of a telecentric and a perspective camera and for the corresponding 3D reconstruction is shown in Fig. 22. Both cameras were equipped with an IDS uEye UI-1222LE sensor. The telecentric camera was not tilted and used a V.S. Technologies L-VS-TC017 lens. The perspective camera used the special housing, described in Sect. 11, that allows the lens to be tilted around the vertical axis. The camera was equipped with a Cosmicar B1214D-2 12.5 mm lens with a 2 mm extension tube, tilted by \(\approx \) \(3.5^\circ \). The stereo setup was calibrated with the approach described in Sect. 9. The calibrated relative pose of the perspective camera with respect to the telecentric camera was computed as \(t_x = -\text {46.00 mm}, t_y = \text {6.41 mm}, t_z = \text {19.23 mm}, \alpha = \text {3.38}^\circ , \beta = \text {37.39}^\circ \), and \(\gamma = -\text {2.66}^\circ \). Figure 22a, b shows two images of a printed circuit board (PCB) acquired with this setup. The telecentric camera looks perpendicularly onto the PCB, the perspective camera at an angle of \(\approx \) \(37^\circ \). Figure 22c, d displays the rectified images. To show that the calibration and rectification worked correctly, the stereo disparities computed from the two rectified images are shown in Fig. 22e. A perspective view of the metric 3D reconstruction is displayed in Fig. 22f. Note that all components on the PCB (ICs, resistors, and capacitors) have been reconstructed correctly. The surfaces of the components and the PCB itself are perfectly flat. There are a few mismatches that are caused by specular reflections, mainly on the leads of the IC, and by the large occlusion in the lower left part of the perspective image. Overall, the reconstruction is of very high quality, showing that the calibration and rectification work correctly.

Remark 21

It might appear that (61) and (62) lead to a completely new camera model for object-side telecentric tilt cameras. However, this is not the case. If (61) or (62) are inserted into (54), a camera matrix \({{{\mathbf {\mathtt{{M}}}}}}\) that lies in the set D of Lemma 2 in Appendix A.3 is obtained. Computing the DIAC of such a camera matrix \({{{\mathbf {\mathtt{{M}}}}}}\) results in:

$$\begin{aligned} \mathbf {\omega }^*_{{{{\mathbf {\mathtt{{H}}}}}}_{\text {90}^\circ }} = \begin{pmatrix} \frac{c_x^2}{d^2} &{} \frac{c_x c_y}{d^2} &{} \frac{c_x}{d^2} \\ \frac{c_x c_y}{d^2} &{} \frac{c_y^2+a_y^2 t^2}{d^2} &{} \frac{c_y}{d^2} \\ \frac{c_x}{d^2} &{} \frac{c_y}{d^2} &{} \frac{1}{d^2} \end{pmatrix}. \end{aligned}$$
(63)

The DIAC is independent of \(\rho \) and tautologically fulfills equations (105) and (106). Equation (107), however, evaluates to \(a_y^2 t^2 / d^4\) and, therefore, is only fulfilled if \(a_y = 0, t = 0\), or \(d = \infty \). These cases, however, cannot occur since we must require \(t \ne 0\) and \(d < \infty \) for \({{{\mathbf {\mathtt{{H}}}}}}_{\text {90}^\circ }\) and \(a_y \ne 0\) for \({{{\mathbf {\mathtt{{K}}}}}}\), and therefore \({{{\mathbf {\mathtt{{M}}}}}}\), to be regular. Thus, the decomposition result from Lemma 2 can be applied. In particular, the matrix \({{{\mathbf {\mathtt{{M}}}}}}\) that arises from the rectification can be interpreted as an object-side telecentric tilt camera with \(\cos \rho = 0\).

As an example, consider an object-side telecentric tilt camera for rectification, using (62) as the tilt homography, with interior orientation parameters \(m = 1, d = 0.02, t = 0.015, \rho = 300^\circ , s_x = s_y = 2 \times 10^{-5}, (c_x, c_y)^\top = (1500, 1000)^\top \), and a pose given by \(\alpha = 30^\circ , \beta = -20^\circ , \gamma = 10^\circ \), and \((t_x, t_y, t_z)^\top = (0.158, 0.078, 0)^\top \). By the decomposition in the proof of Lemma 2, this camera can be represented by a regular object-side telecentric tilt camera with parameters \(m = 1, d = 0.01118034, \tau = 48.189685^\circ , \rho = 90^\circ , s_x = 1 \times 10^{-5}, s_y = 1.333333 \times 10^{-5}, (c_x, c_y)^\top = (0, 1000)^\top \), and a pose given by \(\alpha = 16.164849^\circ , \beta = -32.081247^\circ , \gamma = -20.733967^\circ \), and \((t_x, t_y, t_z)^\top = (0.165832, -0.011450, 0)^\top \). Needless to say, the parameters of the regular object-side telecentric tilt camera are extremely unintuitive. Therefore, while Theorem 3 and, in particular, Lemma 2 show that no special camera model for object-side telecentric tilt cameras for rectification would be required, it is preferable to have an explicit model for rectification because this facilitates an intuitive interpretation of the geometric parameters.

Remark 22

The above discussion shows that the rectified telecentric camera can be regarded as an object-side telecentric tilt camera. Obviously, the rectified perspective camera can also be regarded as a perspective tilt camera. This also applies in the case of two perspective cameras. For a stereo system with two telecentric cameras, the images also must be projected onto a common image plane. This projection can be represented by rotations around the optical axis of the two cameras since the only property that must be fulfilled is that the lines of the images are parallel to the direction that connects the two projection centers in the plane at infinity. The fact that, seemingly, there is no tilt involved is a consequence of Proposition 6. The tilt that would occur in the projection onto the common image plane can be compensated by changing \(s_x\) appropriately. Since \(s_x\) can be chosen freely anyway, the rectification by a rotation around the optical axis effectively is a tilt that is being undone by virtually selecting a smaller \(s_x\). This shows that, for all combinations of camera types, stereo rectification involves cameras with tilt lenses in a very natural manner.

13 Conclusions

We have proposed models for cameras with tilt lenses that correctly model the imaging geometry of lenses for which the ray angles in object and image space differ. The models cover all lens types that are in common use for machine vision and consumer cameras: entocentric, image-side telecentric, object-side telecentric, and bilateral telecentric lenses. We have shown that the tilt can be modeled by orthographic or perspective homographies that project points from an untilted virtual image plane to the tilted image plane. An important aspect of the parameterization of the tilt homographies is that their parameters are easy to understand for the user. Furthermore, we have analyzed the degeneracies of the proposed camera models and have described how they can be handled automatically or manually by the user. The analysis of the degeneracies also has led to theorems that show that all finite projective cameras, affine cameras, and general cameras at infinity, i.e., all projective cameras, can be regarded as cameras with tilt lenses. Each theorem provides a minimal parameterization of the respective class of camera matrices. In addition, we have proposed two efficient and accurate algorithms to remove the perspective and distortion bias from the positions of circular control points, and thus to increase the accuracy of the calibration results. The proposed algorithms are at least an order of magnitude faster than the existing approaches that are capable of compensating perspective and distortion bias. Experiments have verified that the proposed camera models achieve accurate calibration results on numerous configurations of cameras and tilt lenses. The experiments also have shown that the camera models that were proposed up to now are unable to model many existing lenses correctly. Finally, we have described a geometric algorithm to rectify a stereo image pair of a perspective and a telecentric camera. This has led us to an interesting subclass of the tilt camera models: an object-side telecentric camera with an image plane that is tilted by \(90^\circ \).