Advertisement

Looking Through the Glass

  • Annalisa CrannellEmail author
Living reference work entry
  • 288 Downloads

Abstract

Projective geometry allows us, as its name suggests, to project a three-dimensional world onto a two-dimensional canvas. A perspective projection often includes objects called vanishing points, which are the images of projective ideal points; the geometry of these points frequently allows us to either create images or to reconstruct scenes from existing images. We give a particular example of using a pair of vanishing points to locate the position of the artist Canaletto as he painted the Clock Tower in the Piazza San Marco. However, because mappings from three-dimensional space to a two-dimensional plane are not invertible, we can also use perspective and projective techniques to create and analyze illusions (e.g., anamorphic art, impossible figures, the dolly zoom, and the Ames room). Moving beyond constructive (e.g., ruler and compass) projective geometry into analytical projective geometry via homogeneous coordinates allows us to create and analyze digital perspective images. The ubiquity of digital images in the present day allows us to ask whether we can use two (or many) images of the same object to reconstruct that object in part or in entirety. Such a question leads us into the emerging field of multiple view geometry, straddling projective geometry, algebraic geometry, and computer vision.

Keywords

Linear perspective Multiple view geometry Projective geometry Anamorphism 

Introduction

This chapter is about perspective art, and in particular about the role that projective geometry plays in perspective art. Most people are aware that perspective techniques began to flourish during the Renaissance, and as a result drawings and paintings of that era became demonstrably more “realistic” or “lifelike” than art in previous eras. Now we are living through a similar Renaissance, especially in the technological realm (which includes our animated movies, video games, medical imaging, and more). The mathematics that transformed our world several centuries ago still flourishes around us; it continues to have relevance and power in the way we look at the world today.

The word perspective comes from the Medieval Latin roots per (“through”) and specere (“look at” – the same root that gives us “spectacles”). So the title of this chapter is a deliberate pun: like the book written in 1871 by the mathematician Charles Dodgson under his pen name, Lewis Carroll (1871), perspective art literally intends us to look through a window to see the objects it portrays lying on the other side. And as Carroll’s book suggests, sometimes the view that we get by looking through the glass will give us glimpses of the world that are surprising – even wonderful – feats of illusion and magic.

A Brief History

There is a lore that projective geometry has been a subject intimately connected with, and arising from, the development of perspective art. That lore is not entirely in accordance with historical fact. (For a much more comprehensive description of the history of perspective art than this chapter can provide, see Andersen’s excellent volume Andersen 2006.)

The formal introduction of linear perspective is generally credited to Filippo Brunelleschi, an Italian designer, architect, and engineer who lived 1377–1446 (See also “Renaissance Architecture”). His perspective demonstrations relied extensively on geometry but also on physical apparatuses – he interposed mirrors between his canvas and the pictured scenes to validate the accuracy of his images. Brunelleschi’s work had an almost immediate influence on Leon Battista Alberti (1404–1472), an Italian polymath (architect, priest, artist, and author). In 1435, Alberti published Della pittura, his seminal work on perspective, whose influence reached far and wide.

For two centuries, perspective art remained largely in the arena where Brunelleschi and Alberti had placed it: as an exercise in Euclidean geometry and engineering. The German mathematician and astronomer Johannes Kepler (1571–1630) may have been the first person to introduce the projective notion of “points at infinity.” However, Kepler’s motivation arose not from perspective art, but rather from developing a unified theory of conics (e.g., “closing up” the parabola).

In the early-to-mid 1600s, Girard Desargues (1591–1661) published a series of short works, some in perspective art (notably, Desargues 1987) and others in projective geometry. Like Brunelleschi and Alberti, Desargues was a mathematician and engineer. The theorem that bears his name to this day appears in a work of homage by his contemporary, Abraham Bosse (1648). Desargues’s theorem states that “A pair of triangles perspective from a point is also perspective from a line.” This theorem does indeed have perspective art interpretations: see Fig. 1, which depicts a lamp casting a shadow. In this figure, the corresponding vertices of two triangles are collinear from the bulb of the lamp (a point), and the corresponding edges of the triangles are coincident with the line where the glass meets the ground. But it is not clear that the theorem was directly motivated by a similar situation; in Bosse’s manuscript, the formulation of Desargues’s theorem is separated from his description of Desargues’s work in perspective; the diagram and proof are both highly abstract.
Fig. 1

A perspective interpretation of Desargues’s theorem

Desargues’s work seems to have been lost or neglected in the period that follows, possibly because the algebraic approach to geometry put forward by his contemporary, Rene Descartes, proved more versatile. A century later, for example, the artist Canaletto (whom we will return to in section “Where Was the Camera?”) was creating his paintings with the camera obscura rather than with geometry. Across the channel, the English mathematician Brook Taylor (of Taylor’s series fame) would publish his highly celebrated “New Principles of Linear Perspective: or the Art of Designing on a Plane the Representations of all sorts of Objects, in a more General and Simple Method than has been done before” Taylor (1719). But in spite of the promise of the first word of this title, the book contained very little that was “new”; it relied almost exclusively on Euclidean geometry (moreover, it was often described as far from “simple” to read).

Two centuries after Desargues introduced projective geometry, another French engineer and mathematician – Jean-Victor Poncelet (1788–1867) – resurrected it. Famously, Poncelet wrote much of what would become his “Traité des propriétés projectives des figures” during a two-year imprisonment; he had been captured during Napoleon’s campaign against the Russian Empire. Poncelet’s geometry was axiomatic and theoretical, and was not explicitly motivated by, nor applied to, perspective art.

The centuries that followed have seen projective geometry take a variety of forms. Perhaps farthest from perspective art is the subfield of finite projective geometry, with points and lines abstracted (as in the Fano plane, Fig. 2).
Fig. 2

The Fano plane contains seven points, each incident with three “lines”, and seven “lines”, each incident with three points

But fittingly, given the coincident geometric contributions of Desargues and Descartes, it is in the realm of analytical projective geometry where we see recent, exciting applications to perspective images, as well as to reconstructing the objects that make those images. In the sections that follow, we build from perspective applications of “traditional” (ruler and compass) projective geometry toward these analytical applications.

A New Mathematical Object: The Point of Projective Geometry

Traditional perspective art assumes that there is an artist looking with one eye through a window or canvas at the world. We call the location of the viewer’s eye the center of the projection and denote it by the point O; we’ll denote the picture plane by the greek letter ρ, and the image of a real-world point X on the canvas ρ we’ll denote by the symbol X′.

There are other physical setups that give us similar projections on planes. For example, a camera might have a lens or pin-hole that projects objects in the real world onto a sheet of film or a set of pixels; again, we call the lens the center O of the projection, with the film lying in a plane ρ and the object and its image similarly denoted by X and X′, respectively. Or we might have a light source casting a shadow on the ground; the light source in this case would play the role of the center O; the ground becomes the image plane ρ, and the object and its shadow are X and X′.

What all these situations have in common is that the points O, X, and X′ are collinear and that X′ is the intersection of the line through O and X with the plane ρ. (In shorthand mathematical notation, we write X′ = (OX) ⋅ ρ.)

This simple notion runs into difficulties, however, if the point X lies in an “awkward” place: if the line OX is parallel to the plane ρ, then the intersection (OX) ⋅ ρ is empty (at least in the usual realm of Euclidean geometry). Fortunately for artists, this situation does not seem to arise often; if an artist wanted to draw her feet (which presumably are directly below her eye), she would tilt the picture plane rather than leaving the canvas vertical. A much more frequent artistic conundrum is that sometimes the image X′ appears to exist even though the object X does not: this situation arises in the case of the well-known vanishing point. The vanishing point where the two railroad tracks appear to meet together on the horizon plays an extremely important role in a perspective picture, even though there is no such point in the real world.

Ideal Points

To counteract both of the above difficulties with single solution, mathematicians expanded the notion of Euclidean space to a larger space; if we use analytic properties such as coordinates in this space, we call it “projective space” (\(P^3({\mathbb {R}})\)), or if we use purely geometric properties, we call it “Extended Euclidean space” (\(\mathbb {E}^3\)). This larger space includes not only all the familiar points in \(\mathbb {R}^3\), but also an additional set of points called ideal points (or sometimes points at infinity). In the spaces \(P^3({\mathbb {R}})\) and \(\mathbb {E}^3\), we must alter our conception of parallel lines; in particular, lines in \(\mathbb {R}^3\) that are parallel meet in \(\mathbb {E}^3\) at an ideal point. We will delve further into \(P^3({\mathbb {R}})\) in section “Homogeneous Coordinates”; until then, this text will only need the geometric properties of \(\mathbb {E}^3\).

Ideal points are created by what we call a formal definition, meaning that the definition itself “forms” the object. This kind of definition is different than one that merely identifies an existing object: we could define \(\sqrt {2}\) to be “the positive real number x with the property that x2 = 2.” The definition of \(\sqrt {2}\) is not a formal definition, because such a number already exists in \(\mathbb {R}\). But the definition of ideal points creates something new, in the same way that defining the imaginary number i to be “a number z with the property that z2 = −1” creates something that does not exist in \(\mathbb {R}\), leading to the formation of the complex plane \(\mathbb {C}\). In the same way, the space \(\mathbb {E}^3\) is larger than and has different properties from \(\mathbb {R}^3\).

In particular, in \(\mathbb {E}^3\), every line and plane intersect in a point (unless the line is a subset of the plane, in which case their intersection is a line). This means that if the center O is not a subset of the image plane ρ and if OX, the image point X′ = (OX) ⋅ ρ is always well defined.

Similarly, two lines in \(\mathbb {E}^3\) are coplanar if and only if they intersect in exactly one point. In this sense, as we noted above, “parallel” lines are coplanar and intersect in an ideal point. Artistically speaking, the existence of ideal points as the intersection of parallel lines allows us to say that if X′ is a vanishing point in our picture, then the object X that it portrays exists and is a point “at infinity.”

Because vanishing points play such a crucial role in understanding perspective pictures, it is worth looking at these objects more carefully.

Vanishing Points

In the same way we say someone is a “parent” when that person is the parent of some child or group of children, a “vanishing point” is always a vanishing point of some line or collection of lines. An examination of Fig. 3 shows that the line appears to vanish when the artist at point O is looking parallel to ; it follows that a point V ∈ ρ is the vanishing point for the line if and only if
$$\displaystyle \begin{aligned}OV \parallel \ell . \end{aligned}$$
As we noted above, the vanishing pointV is the image of the ideal point (the point at infinity) on .
Fig. 3

Points on the line project to the plane ρ from the center O. Points A and B project to A′ and B′ like a camera with O like a lens; points C and D project to C′ and D′ like shadows with O like a light source; points E and F project to E′ and F′ like drawing on a window, with O like the artist’s eye. The line OV is parallel to ; we say V is the vanishing point of

It follows that if several lines 1, 2, 3, … are parallel to one another, then the line OV is parallel to all of them if and only if OV is parallel to any one of them, so the lines 1, 2, 3, etc. all have the same vanishing point. If the lines 1, 2, 3, etc. are parallel to one another but not parallel to the picture plane, it follows that V is a real (rather than ideal) point, so their images \(\ell _1^{\prime }, \ell _2^{\prime }, \ell _3^{\prime }, \) etc. are not parallel but rather all intersect at that point V (giving us, e.g., the drawing of the railroad tracks that converge at a point in the horizon). If the lines are parallel to one another and also parallel to the picture plane, then OV is likewise parallel to the picture plane, implying V is an ideal point, and so the images \(\ell _1^{\prime }, \ell _2^{\prime }, \ell _3^{\prime }, \) etc. will all be parallel to each other (as well as to the original lines).

Note that this definition of vanishing point implies something significant about interpreting a piece of art. If we know something about a set of lines (say, we can infer that the lines in the road were running perpendicularly to the canvas), and we can locate the vanishing points of those lines on the canvas, then this means we know something about the location O of the artist, and this location is something we explore further in the next section.

Where Was the Camera?

In the previous section, we claimed that the location of vanishing points helps us determine the location of the artist or camera that made the picture. In this section, we explore the implications of this claim. Determining the location of an artist or of a camera is the source of a good amount of mathematical inquiry (see, e.g., references Byers and Henle 2004, Crannell 2006, Futamura and Lehr 2017, Robin 1978, and Tripp 1987). Moreover, the methods for solving this question lead to multiple applications, as we will see in the sections that follow (See also “Geometries of Light and Shadows from Piero della Francesca to James Turrell”).

Here we give one simple example of using geometry to locate the original position of an artist: a standard, back-of-the envelope calculation that uses two vanishing points to determine the original viewing location. Figure 4 shows a painting from circa 1730 of the Clock Tower in the Piazza San Marco, a noted tourist attraction. The artist, Giovanni Antonio Canal (better known as “Canaletto”), was noted for his realistic city scapes; he often used a camera obscura to project images onto canvas where he would capture them in paint. As such, his works give us excellent examples of perspective projections.
Fig. 4

“The Clock Tower in the Piazza San Marco”, Canaletto (circa 1730)

In Fig. 4 we can see that images of vertical lines in the Piazza have vertical images on Canaletto’s canvas. Likewise, the horizontal lines in the front face of the clock tower building also have horizontal images. This tells us that Canaletto’s canvas was set up parallel to that face of the building. We can deduce that a third set of lines depicted in the picture run perpendicularly to the canvas. Figure 5 shows that these lines have images that converge at a point V in the second floor of the building, near the main doorway and below the clock in the painting.
Fig. 5

The vanishing point V of those lines that are perpendicular to the canvas shows that Canaletto painted this canvas from a second-story location. The vanishing point D the diagonal line of a vertical rectangle lies directly above V

Because this third set of real-world lines are perpendicular to the canvas, it follows that Canaletto was perpendicularly across from the point V depicted in the picture – in other words, he was not standing on the ground, but was stationed on the second floor of another building.

But we can be even more specific. The picture also contains clues that help us deduce his horizontal distance from the clock tower. On the right side of the plaza is a building with semicircular arches. Around one of these arches, we can draw the image of a rectangle that is twice as long as it is high. We draw the image of the diagonal line through this rectangle, which has vanishing point D directly above V (see Figs. 5 and 6). Because the slope of the real-world line is 1/2, the geometry of similar triangles allows us to deduce that the viewing distance (the distance from Canaletto to the canvas) is twice the length of the segment \(\overline {VD}\). Assuming the clock tower to be approximately 70 ft tall (based on its height relative to the people in the picture), we get that the height of the clock tower appears to be 35% the length of \(\| \overline {VD} \|\), so Canaletto was approximately 200 ft from the clock tower.
Fig. 6

A side view showing the location O of the artist and the picture plane ρ. The point V ∈ ρ is both the image of a point C on the second floor of the clock tower and also the vanishing point for lines perpendicular to the picture plane ρ. The line d has vanishing point D ∈ ρ, so OD ∥ d; therefore, the slope of OD is 1/2

In conclusion, a few standard assumptions about Canaletto’s world (buildings were constructed with right angles, the arches were semicircles, and people were approximately the same height they are today) allow us to reconstruct the location of that artist as he painted this picture 300 years ago.

A Consequence of Viewing Distances: Illusion, Distortion, and Anamorphism

Understanding where the artist stood is more than a historical exercise; it also has the power to affect how we view photographs and the apparent distortion within them. Almost every person has had the experience of seeing a breathtaking vista and trying to capture it on camera, only later to lament that the photograph didn’t do justice to the power of the original view. Often, the problem is not with the mechanics of the photograph or the photographer, but with the small size of the image coupled with the too-far distance of the person looking at the photograph. If the photograph were larger, or if its viewer were closer, the sense of awe for the vista might return.

Figure 7 gives an example of why the size of a photograph, a movie screen, or a reproduction of a perspective painting matters. Good perspective artists often place their vanishing points far off the picture because doing so “reduces distortion.” In Fig. 7, we have instead sized the drawing in such a way that the vanishing points are readily apparent on the page (like consolidating a magnificent scenic view into a photo that is only as wide as a phone or a laptop). Notice that the word “LIFE” appears to be highly distorted. In particular, the bottom corner of the “L” has an angle in the drawing of 48, even though this vertex is supposed to represent a right-angled corner. We could have made the corner appear more like a right angle by placing the vanishing points further apart. But surprisingly, we can also make the corner appear more like a right angle by moving ourselves closer to the drawing. If a viewer moves uncomfortably close to this picture (in particular, if a person looks with one eye from a location very close to the × on the horizon), the angles in the word appear to be correct, 90 angles.
Fig. 7

“LIFE” in two-point perspective, with vanishing points indicated on the horizon. The near bottom corner of the “L” has an angle of 48, but if you look at the picture with one eye from very close to the × on the horizon line, the angle appears to be “correct”; that is, it appears to be 90

Figure 8 explains why moving our eye close to the image helps the picture appear more realistic. The viewer at O1 is far from the image – just as most readers of this chapter will view “LIFE” in Fig. 7 from a comfortable distance. The lines of sight to the two vanishing points for the viewer at O1 form an acute angle θ. Recall that when an artist draws a scene through a window, the vanishing points in the picture plane will lie on those lines of sight that are parallel to the lines she is drawing in the “real world.” Therefore, for the viewer at O1, the drawing appears to depict an object that is likewise formed by the acute angle θ.
Fig. 8

A top view showing two viewers looking at the picture plane ρ. The lines of sight from the viewers to the two vanishing points are parallel to the lines of the objects they appear to see in the “real world”; hence the viewer at O1 sees a diamond-shaped object, while the viewer at O2 sees a rectangle

On the other hand, the viewer at O2 is closer to the picture plane, at a place where the lines of sight from O2 to the vanishing points are perpendicular. Therefore, for this viewer, the drawing appears to depict an object in the real world formed by lines that are likewise perpendicular to one another. In other words, if the drawing is supposed to depict an object with right angles, the closer viewer sees an “undistorted” picture, whereas the further viewer sees a distorted image.

The reason our photographs don’t capture what we remember seeing is not because the camera messed up; it is because we view the small photographs from too far away. Enlarging the photos or moving closer to the photos will restore the illusion of depth.

Dolly Zoom

Cinematographers make effective use of altered viewing distances to create a distinctive mood. Figure 9 shows one of the most effective and common of these: a movie camera technique called the dolly zoom. (The dolly zoom has many other names – including the Hitchcock zoom, because it first appeared in that director’s film Vertigo when it was pioneered by cameraman Irmin Roberts.) In this zoom, the camera is placed along a track and pulled backward while simultaneously zooming in on the figure in the foreground.
Fig. 9

Side views showing the camera up close, and then drawn back while zoomed in. Note that the image of the house in the distant background grows larger relative to the image of the person

The effect of this technique appears in Fig. 10. When the camera is close, even though the house and tree are large objects, they are far from the camera, so the nearby person seems relatively large compared to the background objects. But as the camera zooms in on the person’s face while simultaneously drawing backward, the background objects seem to swell in size. The effect is to make the world appear to loom large, giving that short scene a disturbingly ominous feeling.
Fig. 10

In the first figure, the camera is close, so the nearby person seems relatively large compared to the background objects; in the second figure, the camera zooms in on the person’s face while simultaneously drawing back, so that the background objects seem to swell ominously

If the camera pulls back slowly (as in a diner scene in Goodfellas), the psychological effect is one of creeping unease. The audience is aware of something being not quite right, but can’t quite place the source of trouble. Often, however, the camera zooms back quickly: in The Lord of the Rings: Fellowship of the Rings, as Frodo stands on a road, the accompanying dolly zooms last a fraction of a second, evoking a feeling of terror. It’s no surprise, then, that Michael Jackson’s Thriller video ends with a similar, speedy zoom! These sudden zooms are technically difficult and costly, but clearly they are worth the expense and effort to the directors of these films. See Boing Boing (2015), for example, for a video clip purporting to be “23 of the best dolly zooms in cinematic history.”

Of course, the effect can be reversed (even with a virtual camera); at the end of Fiona’s battle with Robin Hood’s men in the animated movie Shrek, there is a split-second reverse dolly zoom, giving the sudden impression that the battle is over and all is right with the world.

Anamorphic Art

The word “LIFE” in Fig. 7 looks moderately distorted because of the unusually close viewing distance, but the word is still recognizable because the viewing target (at the “×”) is centered on the horizon. That is, if we hold this picture in front of us, we’ll be centered on the viewing target; the distortion comes solely from the distance between our eye and that target.

Perspective techniques allow artists to create even more significant illusions by locating the viewing target close to an edge of the canvas – or even off the edge of the canvas. One of the most famous examples of this technique, called anamorphism, appears in a 1533 painting by the German-born artist Hans Holbein the Younger. The Ambassadors (Fig. 11) appears to show a wealthy landowner and a Bishop surrounded by objects both secular and religious. Toward the bottom of the painting is an odd gray-and-black smear; this smear is in fact meant to be viewed from the extreme right edge of the painting. A viewer standing at this extreme angle would not be able to see the men and their possessions clearly, but would clearly be able to see a skull hidden in plain view within the painting (Fig. 12).
Fig. 11

The Ambassadors by Hans Holbein the younger (Holbein, 1533)

Fig. 12

Viewed from the extreme right and close to the canvas, the smear on The Ambassadors appears to be a skull

Anamorphic art is hardly confined to the sixteenth century; it abounds today in curated museum shows, in public spheres (e.g., in the New York subway system), and in art-gone-viral (just perform an Internet search for the sidewalk chalk artist, Julian Beever, sidewalk art (Beever 2019)). See “Anamorphosis: Between Perspective and Catopritics” for a fuller treatment of the topic. Anamorphism has its practical aspects, too: turn arrows painted on roadways look highly distorted when seen from directly above but appear correct to the drivers approaching along the road. There are parking garages that paint anamorphic exit signs, which make sense to the cars needing to leave the building but appear to be a jumble otherwise.

Impossible Figures

The above examples show how perspective art can “hide” or distort the image of a real-world, three-dimensional object within a two-dimensional canvas. But perspective art can also make unreal objects appear to exist. One famous example of such an example is the eponymous Penrose triangle (Fig. 13), popularized in the 1950s by the father-son team of Lionel and Sir Roger Penrose, a psychologist and mathematician.
Fig. 13

A Penrose triangle is an “impossible figure”

This triangle is one of the simplest and most iconic examples of what we call “impossible figures.” Locally, at each corner of the object, this appears to be the image of a solid three-dimensional object made of flat surfaces with linear edges. But the object as a whole contradicts the local analysis. For example, as we travel around the object counter-clockwise, each subsequent corner appears to be closer to the viewer than the previous one – an impossibility in a closed loop!

Many artists include “impossible figures” in their work, including Swedish artist Oscar Reutersvard – who is credited with the 1930s discovery of the triangle that would later bear the Penrose name – and M.C. Escher, whose Waterfall, Ascending and Descending, Belvedere (among many others) have captivated and perplexed generations of curious viewers.

For this reason, it’s especially interesting that artists have created three-dimensional statues depicting impossible figures – see, for example, Fig. 14. These statues, even more than their two-dimensional counterparts, require a strict alignment with a particular viewing position for the illusion to be effective.
Fig. 14

Two views of a Penrose Triangle sculpture at the Deutsches Technikmuseum, Berlin, February 2008 (Deutsches-Technikmuseum, 2008)

The observation that the same object (such as the Penrose Triangle sculpture above) can have very different appearances when viewed from two different locations is one of the reasons that reconstruction of three-dimensional objects from two-dimensional photographs is such a challenging one. This challenge is the focus of the next section.

Going Backward from Pictures to 3D

In the centuries that saw Desargues, Canaletto, and Poncelet, the task of drawing accurate images and maps was a significant technological challenge. But in today’s world – where cameras are built into cell phones – accurate images surround us. The ubiquity of digital images has allowed us to attempt new technological challenges of our day: to recreate a three-dimensional world from a collection of photographs.

See for example Fig. 15, the lead figure from a highly cited paper entitled “Building Rome in a Day” by Agarwal et al. (2011). In this rendering of the Colosseum, each triangle in the picture is the location of one of more than 2000 cameras that had uploaded photos to Flickr.
Fig. 15

A reconstruction of the Colosseum from photographs uploaded to Flickr (Agarwal et al., 2011)

The authors describe their work in this way:

Entering the search term “Rome” on Flickr returns more than two million photographs. This collection represents an increasingly complete photographic record of the city, capturing every popular site, facade, interior, fountain, sculpture, painting, cafe, and so forth. It also offers us an unprecedented opportunity to richly capture, explore and study the three dimensional shape of the city.

We are all familiar with computer games that allow us to move through a virtual 3D world, and also with online sites (such as Google Maps) that allow us to virtually “move” through city streets while seated at our computers. These newly familiar experiences rely on already knowing the structure of space. Virtual gaming worlds have a three-dimensional structure already encoded into the software; Google maps takes images from satellite or roving, calibrated cameras with GPS coordinates encoded into the image.

What makes the work of Agarwal (etc.) a geometrical challenge is the almost complete absence of a priori geographic or spatial information. Piecing the world back together from a collection of random of photographs is like fitting together a jigsaw puzzle with a million pieces, some of which are missing and many of which are redundant. (Almost no one takes pictures of the dumpster behind the grocery store; millions of people take photographs of a famous statue.)

Reconstructing three-dimensional objects from a collection of photographs requires going through these three steps:
  1. 1.

    identifying feature points or lines that match across images;

     
  2. 2.

    doing a reconstruction from pairs or possibly triplets of images; and

     
  3. 3.

    piecing together and refine these many reconstructions using optimization.

     

The first step requires careful use of a cluster of computers, one of which is designated as the “master node” that distributes images to individual computers (nodes) in a balanced manner. The nodes each toil away at pre-processing images by verifying they are readable and extracting available camera information (if any is attached). The process of matching images is not entirely random; in the same way most people begin solving a jigsaw puzzle by looking for edge pieces, the matching algorithm uses a library of SIFT (Scale Invariant Feature Transform) features.

Likewise, the third (final) step uses intense use of computational algorithms, outside the scope of this chapter.

Step two is where projective geometry comes in; this step requires “undoing” the kind of perspective map that Desargues and Caneletto mastered long ago. This step is the basis for the field of multiple view geometry, an increasingly fertile area of research for theoretical and applied mathematicians alike. Indeed, the author was introduced to this subject at an energetic week-long gathering of university professors and Google engineers at a conference on Algebraic Vision hosted by the American Institute of Mathematics in summer 2016.

To describe the locations of real-world points and their photographic images in a way that is amenable to computer algorithms, we will need to understand homogeneous coordinates for space; that is the subject of the next section.

Homogeneous Coordinates

To motivate the use of homogeneous coordinates(as contrasted with Cartesian coordinates), we return to the notion of an observer positioned at the origin \(\textbf {0} = (0,0,0)\in \mathbb {R}^3\), gazing at the world through a picture plane. To this observer, every point along a given line of sight will map to the same point on the picture plane. In particular, points (x, y, z) and (λx, λy, λz) have the same image whenever λ≠0. If the picture plane is z = 1, for example, then both of these points map to \(\displaystyle \left ( \frac xz, \frac yz, 1 \right )\).

In this vein, we form \(P^2({\mathbb {R}})\), the projective plane, as equivalence classes of points in \(\mathbb {R}^3\setminus \{ \textbf {0} \}\). A point in \(P^2({\mathbb {R}})\) can be written in homogeneous coordinates in the form \(\left [ x : y : z \right ]^T\) for \((x,y,z)\in \mathbb {R}^3\setminus \{\textbf {0} \}\); we say
$$\displaystyle \begin{aligned}\left[ \begin{array}{c} x\\y\\z \end{array}\right] = \left[ \begin{array}{c} \lambda x \\ \lambda y \\ \lambda z \end{array}\right] \end{aligned}$$
whenever λ≠0. Just as points in \(P^2({\mathbb {R}})\) correspond to real lines through the origin; lines in \(P^2({\mathbb {R}})\) correspond to real planes through the origin. Said another way, projective points \(\left [ {x_1} : {y_1} : {z_1} \right ]^T\), \(\left [ {x_2} : {y_2} : {z_2} \right ]^T \), and \(\left [ {x_3} : {y_3} : {z_3} \right ]^T \) are collinear in \(P^2({\mathbb {R}})\) precisely when real points (x1, y1, z1), (x2, y2, z2), and (x3, y3, z3) are coplanar in \(\mathbb {R}^3\).

The projective plane \(P^2({\mathbb {R}})\) and the Extended Euclidean plane \(\mathbb {E}^2\) (see section “A New Mathematical Object: The Point of Projective Geometry”) have a natural correspondence. If we think of \(\mathbb {E}^2\) as the extension of the particular plane z = 1, then we can identify the projective point \(\left [ a : b : c \right ]^T \) with the ordinary point \((\frac ac, \frac bc, 1)\) whenever c≠0; projective points of the form \(\left [ a : b : 0 \right ]^T \) correspond to ideal points in \(\mathbb {E}^2\). This makes some intuitive sense, as these correspond to the observer’s lines of sight that are parallel to the picture plane, and so “intersect” the plane z = 1 “at infinity”.

We define \(P^3({\mathbb {R}})\) analagously: projective points take the form
$$\displaystyle \begin{aligned}\left[ x : y : z : w \right]^T = \left[ {\lambda x} : {\lambda y} : {\lambda z} : {\lambda w} \right]^T \in P^3({\mathbb{R}}) \end{aligned}$$
for \((x,y,z,w)\in \mathbb {R}^4\setminus \{\textbf {0}\}\) and λ≠0. As before, we can find a natural correspondence between \(P^3({\mathbb {R}})\) and \(\mathbb {E}^3\) (say, via the identification using w = 1). These homogeneous coordinates underlie much of the field of analytical projective geometry.
To understand how the use of homogeneous coordinates helps us understand camera projections, consider the case of an observer standing at \(\left [ 0 : 0 : 0 : 1 \right ]^T \), looking through a planar window located at z = d, w = 0, which we think of as an embedding of \(P^2({\mathbb {R}})\subset P^3({\mathbb {R}})\). To such an observer, the point \(\left [ x : y : z : w \right ]^T \) would have an image on the window located at
$$\displaystyle \begin{aligned}\left[ {\frac {dx}z} : {\frac {dy}z} : d \right]^T = \left[ {dx} : {dy} : z \right]^T. \end{aligned}$$
That is, we can compute the transformation \(P^3({\mathbb {R}})\to P^2({\mathbb {R}})\) above via the matrix multiplication
$$\displaystyle \begin{aligned}P \left[ \begin{array}{c} x\\y\\z\\w \end{array}\right] = \left( \begin{array}{cccc} d & 0 & 0 & 0 \\ 0 & d & 0 & 0\\ 0 & 0 & 1 & 0 \end{array} \right) \left[ \begin{array}{c} x\\y\\z\\w \end{array}\right]= \left[ \begin{array}{c} dx\\dy\\z \end{array}\right] . \end{aligned}$$
The computation above shows why algebraic geometers define a camera to be a 3 × 4 matrix. Moving the viewer, shifting the film, rotating the image plane, or using a camera with non-square pixels has the effect of changing the entries of the camera matrix P. (See (Hartley and Zisserman, 2003, Chapter 6) for a fuller description.)
Figure 16 demonstrates putting this into practice in a rather simple spreadsheet. In this sheet, we draw the one-point perspective image of a cube; the viewing distance is 4 and the viewing target is (2, 7).
Fig. 16

Using a spreadsheet to draw the perspective image of a cube with viewing distance 4 and viewing target (2, 7)

Multiple View Geometry

How do we recover information about a three-dimensional world from two-dimensional images?

Suppose we have two images of the same real-world object. Usually, one of the first steps in reconstruction of the 3D scene is to determine what is called the fundamental mapping taking points in the first image α to a certain set of lines in the second image β. The description below explains how and why this mapping emerges.

We say points xα ∈ α and xβ ∈ β are corresponding points if they are images via the appropriate maps of a common point \(X\in P^3({\mathbb {R}})\). That is, X projects onto xα ∈ α from the point Oα, and X projects onto xβ ∈ β from the point Oβ. Then the five points X, xα, Oα, xβ, and Oβ are necessarily coplanar. Note that the line (OαOβ) – called the epipolar line – lies in every such plane constructed from corresponding points. Of particular interest along this line are the epipolar points eα = α ⋅ (OαOβ) and eβ = β ⋅ (OαOβ). We can think of eα as the image in α of the camera at Oβ, and eβ as the image in β of the camera at Oα.

The point xα might correspond to several different points in the plane β. For example, the camera at Oα might appear to show a tree growing out of a person’s head: the point xα could come from both the person’s hat and the trunk of the tree. The images of the hat and trunk in another photograph β might not coincide with each other, but because of the coplanar relationship described in the previous paragraph and illustrated in Fig. 17, they must be collinear with the epipolar point eβ. Accordingly, a pair of photographs of the same scene, taken from two different camera locations, describe a function from points xα in α to lines (eβxβ) in β. This function is called the fundamental mapping.
Fig. 17

The point X and its images xα and xβ lie in a plane with the line containing the centers (Oα and Oβ) and the epipoles (eα and eβ)

Because xα and xβ can be thought of as points in \(P^2({\mathbb {R}})\), we can represent the fundamental mapping with a 3 × 3 matrix F, called the fundamental matrix. In general, we can determine F from 7 pairs of corresponding points in general position (the matrix is a rank-2 matrix and therefore has 7 degrees of freedom). For each of these corresponding pairs of points, the mapping satisfies Fxα = (eβxβ); that is to say,
$$\displaystyle \begin{aligned}x_\beta^T F x_\alpha = 0. \end{aligned}$$
In the previous section, we described a camera as a 3 × 4 matrix. If we have two images α and β, then the fundamental matrix allows us to describe a relationship between the two cameras Pα and Pβ which created the two images. Why is this? For any point \(X\in P^3({\mathbb {R}})\), we have
$$\displaystyle \begin{aligned}(X^T P_\beta^T) F (P_\alpha X) = (x_\beta^T) F x_\alpha = (x_\beta^T) (e_\beta x_\beta) = 0. \end{aligned}$$
Therefore, it follows that \(P_\beta ^T F P_\alpha \) is a skew-symmetric matrix. This fact is a foot-in-the-door for developing reconstruction algorithms.

How, then, do we use the fundamental matrix to reconstruct the real-world scene? The answer is not simple, as the figure of the Ames room below shows.

The Ames Room

The Ames room, designed by perceptual psychologist Adelbert Ames, Jr., is an illusion room. Viewers who peer into the room from a peephole in the wall seem to see objects that grow and shrink as the objects move from one side to the other. The illusion works because from the correct vantage point, the room appears to be a “normal,” rectangular room. But in fact, the walls, ceiling, and floors are trapezoids, with the short edges close to the vantage point and the long edges far from the vantage point. The illusion that the room is rectangular, and not trapezoidal, can be hard to overcome, even when viewers have been inside the room or see people they know walking through it, appearing to shrink or grow as they walk (Fig. 18).
Fig. 18

Ames room: “Room constructed to make a person appear large or small depending on perspective, in the city of Rio de Janeiro, Brazil.” (Courtesy of Andrevruas) (Andrevruas, 2011)

Said another way, the Ames roomis projectively equivalent to a normal room; there is a collineation \(P^3({\mathbb {R}})\to P^3({\mathbb {R}})\) (a function that takes points to points and lines to lines) that maps the Ames room onto a normal, rectangular room. For this reason, the methods described above can determine the relationship between two cameras – and thereby the reconstruction of the three-dimensional scene – only up to projective equivalence. The fundamental mapping by itself can help us distinguish between an Ames room and an A-frame house, but it can’t tell an Ames room from a regular rectangular room. We can’t extract distance or angle measurements of real-world objects without a priori information about the scene or the cameras.

Reconstructing Objects from Images

Knowing real-world information vastly increases the ease with which we can reconstruct objects from images. A “calibrated camera” makes the reconstruction process much simpler. For instance, many modern digital cameras come available with GPS information encoded into the image. For even more accuracy, many 3D scanners use a known camera that is a fixed distance from a turntable rotating at known angles. Knowing the focal length of the camera allows us to account for phenomena such as the dolly zoom; knowing the viewing target allows us to account for anamorphic effects (see section “A Consequence of Viewing Distances: Illusion, Distortion, and Anamorphism”).

Real-world information is useful as well. Note that in analyzing Canaletto’s painting in section “Where Was the Camera?,” we used standard observations about real-world parallel lines, and also about real-world perpendicular lines, to gain information about Canaletto’s viewing position. In general (meaning, if the scene is not an Ames-room-like scene), this kind of assumption means that reconstructing scenes with architectural features is simpler than, say, reconstructing landscapes. We can see the importance of knowing such geometric information for understanding drawings like “LIFE” (Fig. 7) or the Penrose sculpture (Fig. 13).

In the analysis of Canaletto’s painting, we also used information about proportions (by assuming the arch was a semicircle) and about actual size (e.g., the heights of the people pictured). This kind of detective work is another part of reconstruction; without it, we can’t distinguish between photographs of, for example, a single-family home and a doll’s house.

In practice, the task of reconstruction is further complicated by “noise” and error: points are infinitesimal, but pixels are discrete and finite. So optimization and error analysis also enter into reconstruction algorithms.

Nonetheless, at the heart of any reconstruction lies the language of homogeneous coordinates and analytical projective geometry.

Conclusion

The long and storied history of projective geometry weaves itself through the last half-millennium of mathematics; it is a subject that has been discovered and rediscovered by mathematicians searching for answers beyond Euclidean geometry. Its reemergence under Poncelet points to the aesthetic elegance of its axiomatic structure; the subject has also led to deeper understandings of conics (e.g., under the influence of Steiner) and of topology (e.g., under Möbius).

But its utility in perspective drawings and photographs is where the subject of projective geometry becomes most applied and touches our lived experiences most directly. With the passing of time, this tool is becoming even more relevant and powerful than when Desargues first introduced it. We live in a world that is increasingly visual, a world in which technology creates, reproduces, and alters images constantly; analytic projective geometry is the machinery that allows us to create, explain, and analyze these digitized images.

Beyond the technical aspect of analyzing digital images, constructive projective geometry gives us all a way to see our surroundings and the objects in them: to better understand how to look at paintings or our vacation photographs, to create or to dispel illusions, and to interpret the way we look at our wonderful, three-dimensional world.

Cross-References

References

  1. Agarwal S, Furukawa Y, Snavely N, Simon I, Curless B, Seitz SM, Szeliski R (2011) Building Rome in a day. Commun ACM 54(10):105–112. With a Technical Perspective by Prof. Carlo TomasiCrossRefGoogle Scholar
  2. Andersen K (2006) The geometry of art: the history of the mathematical theory of perspective from Alberti to Monge. Springer, New YorkGoogle Scholar
  3. Andrevruas (2011) Português: Casa construída de forma a fazer a pessoa parecer grande ou pequena dependendo da perspectiva, na cidade do Rio de Janeiro, 24 Jan 2011. https://commons.wikimedia.org/wiki/File:Casaperspectiva.jpg, from Wikimedia Commons
  4. Beever J (2019) Julian Beever’s website. http://www.julianbeever.net/
  5. Boing Boing (2015) Watch 23 of the best dolly zooms in cinematic history, 26 Jan 2015. https://boingboing.net/2015/01/26/watch-23-of-the-best-dolly-zoo.html Google Scholar
  6. Bosse A (1648) Manière universelle de Mr. Desargues, pour pratiquer la perspective par petit-pied, comme le Geometral, ParisGoogle Scholar
  7. Byers K, Henle J (2004) Where the camera was. Math Mag 77:4:251–259MathSciNetCrossRefGoogle Scholar
  8. Canaletto GA (circa 1730) The Clock Tower in the Piazza San Marco. https://commons.wikimedia.org, oil on canvas, 69.22 × 86.36 cm, current location at the Nelson-Atkins Museum of Art
  9. Carroll L (1871) Through the looking-glass. Macmillan & Co, LondonGoogle Scholar
  10. Crannell A (2006) Where the camera was, take two. Math Mag 79:4:306–308CrossRefGoogle Scholar
  11. Desargues G (1987) Exemple de l’une des manieres universelles du s.g.d.l. touchant la pratique de la perspective sans emploier aucun tiers point, de distance ny d’autre nature, qui soit hors du champ de l’ouvrage. In: The geometrical work of Girard Desargues. Springer, New York, p 1636Google Scholar
  12. Deutsches-Technikmuseum (2008) Penrose triangle sculpture. https://commons.wikimedia.org/w/index.php?curid=3597501, images from Wikimedia Commons
  13. Futamura F, Lehr R (2017) A new perspective on finding the viewpoint. Math Mag 90(4):267–277MathSciNetCrossRefGoogle Scholar
  14. Hartley R, Zisserman A (2003) Multiple view geometry in computer vision, 2nd edn. Cambridge University Press, New YorkzbMATHGoogle Scholar
  15. Holbein H (1533) The Ambassadors. https://commons.wikimedia.org, oil on oak, 209.5 cm× 207 cm
  16. Robin AC (1978) Photomeasurement. Math Gaz 62:77–85CrossRefGoogle Scholar
  17. Taylor B (1719) New principles of linear perspective: or the art of designing on a plane the representations of all sorts of objects, in a more general and simple method than has been done before, LondonGoogle Scholar
  18. Tripp C (1987) Where is the camera? The use of a theorem in projective geometry to find from a photograph the location of a camera. Math Gaz 71:8–14CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Franklin & Marshall CollegeLancasterUSA

Section editors and affiliations

  • Bharath Sriraman
    • 1
  • Kyeong-Hwa Lee
    • 2
  1. 1.Department of Mathematical SciencesThe University of MontanaMissoulaUSA
  2. 2.Department of Mathematics Education, College of EducationSeoul National UniversitySeoulSouth Korea

Personalised recommendations