# Looking Through the Glass

- 288 Downloads

## Abstract

Projective geometry allows us, as its name suggests, to project a three-dimensional world onto a two-dimensional canvas. A perspective projection often includes objects called *vanishing points*, which are the images of projective *ideal points*; the geometry of these points frequently allows us to either create images or to reconstruct scenes from existing images. We give a particular example of using a pair of vanishing points to locate the position of the artist Canaletto as he painted the Clock Tower in the Piazza San Marco. However, because mappings from three-dimensional space to a two-dimensional plane are not invertible, we can also use perspective and projective techniques to create and analyze illusions (e.g., anamorphic art, impossible figures, the dolly zoom, and the Ames room). Moving beyond constructive (e.g., ruler and compass) projective geometry into analytical projective geometry via homogeneous coordinates allows us to create and analyze digital perspective images. The ubiquity of digital images in the present day allows us to ask whether we can use two (or many) images of the same object to reconstruct that object in part or in entirety. Such a question leads us into the emerging field of *multiple view geometry*, straddling projective geometry, algebraic geometry, and computer vision.

## Keywords

Linear perspective Multiple view geometry Projective geometry Anamorphism## Introduction

This chapter is about perspective art, and in particular about the role that projective geometry plays in perspective art. Most people are aware that perspective techniques began to flourish during the Renaissance, and as a result drawings and paintings of that era became demonstrably more “realistic” or “lifelike” than art in previous eras. Now we are living through a similar Renaissance, especially in the technological realm (which includes our animated movies, video games, medical imaging, and more). The mathematics that transformed our world several centuries ago still flourishes around us; it continues to have relevance and power in the way we look at the world today.

The word *perspective* comes from the Medieval Latin roots *per* (“through”) and *specere* (“look at” – the same root that gives us “spectacles”). So the title of this chapter is a deliberate pun: like the book written in 1871 by the mathematician Charles Dodgson under his pen name, Lewis Carroll (1871), perspective art literally intends us to look through a window to see the objects it portrays lying on the other side. And as Carroll’s book suggests, sometimes the view that we get by looking through the glass will give us glimpses of the world that are surprising – even wonderful – feats of illusion and magic.

## A Brief History

There is a lore that projective geometry has been a subject intimately connected with, and arising from, the development of perspective art. That lore is not entirely in accordance with historical fact. (For a much more comprehensive description of the history of perspective art than this chapter can provide, see Andersen’s excellent volume Andersen 2006.)

The formal introduction of linear perspective is generally credited to Filippo Brunelleschi, an Italian designer, architect, and engineer who lived 1377–1446 (See also “Renaissance Architecture”). His perspective demonstrations relied extensively on geometry but also on physical apparatuses – he interposed mirrors between his canvas and the pictured scenes to validate the accuracy of his images. Brunelleschi’s work had an almost immediate influence on Leon Battista Alberti (1404–1472), an Italian polymath (architect, priest, artist, and author). In 1435, Alberti published *Della pittura*, his seminal work on perspective, whose influence reached far and wide.

For two centuries, perspective art remained largely in the arena where Brunelleschi and Alberti had placed it: as an exercise in Euclidean geometry and engineering. The German mathematician and astronomer Johannes Kepler (1571–1630) may have been the first person to introduce the projective notion of “points at infinity.” However, Kepler’s motivation arose not from perspective art, but rather from developing a unified theory of conics (e.g., “closing up” the parabola).

Desargues’s work seems to have been lost or neglected in the period that follows, possibly because the algebraic approach to geometry put forward by his contemporary, Rene Descartes, proved more versatile. A century later, for example, the artist Canaletto (whom we will return to in section “Where Was the Camera?”) was creating his paintings with the *camera obscura* rather than with geometry. Across the channel, the English mathematician Brook Taylor (of *Taylor’s series* fame) would publish his highly celebrated “*New Principles of Linear Perspective: or the Art of Designing on a Plane the Representations of all sorts of Objects, in a more General and Simple Method than has been done before*” Taylor (1719). But in spite of the promise of the first word of this title, the book contained very little that was “new”; it relied almost exclusively on Euclidean geometry (moreover, it was often described as far from “simple” to read).

Two centuries after Desargues introduced projective geometry, another French engineer and mathematician – Jean-Victor Poncelet (1788–1867) – resurrected it. Famously, Poncelet wrote much of what would become his “Traité des propriétés projectives des figures” during a two-year imprisonment; he had been captured during Napoleon’s campaign against the Russian Empire. Poncelet’s geometry was axiomatic and theoretical, and was not explicitly motivated by, nor applied to, perspective art.

But fittingly, given the coincident geometric contributions of Desargues and Descartes, it is in the realm of analytical projective geometry where we see recent, exciting applications to perspective images, as well as to reconstructing the objects that make those images. In the sections that follow, we build from perspective applications of “traditional” (ruler and compass) projective geometry toward these analytical applications.

## A New Mathematical Object: The Point of Projective Geometry

Traditional perspective art assumes that there is an artist looking with one eye through a window or canvas at the world. We call the location of the viewer’s eye the *center of the projection* and denote it by the point *O*; we’ll denote the picture plane by the greek letter *ρ*, and the image of a real-world point *X* on the canvas *ρ* we’ll denote by the symbol *X′*.

There are other physical setups that give us similar projections on planes. For example, a camera might have a lens or pin-hole that projects objects in the real world onto a sheet of film or a set of pixels; again, we call the lens the *center* *O* of the projection, with the film lying in a plane *ρ* and the object and its image similarly denoted by *X* and *X′*, respectively. Or we might have a light source casting a shadow on the ground; the light source in this case would play the role of the center *O*; the ground becomes the image plane *ρ*, and the object and its shadow are *X* and *X′*.

What all these situations have in common is that the points *O*, *X*, and *X′* are collinear and that *X′* is the intersection of the line through *O* and *X* with the plane *ρ*. (In shorthand mathematical notation, we write *X′* = (*OX*) ⋅ *ρ*.)

This simple notion runs into difficulties, however, if the point *X* lies in an “awkward” place: if the line *OX* is parallel to the plane *ρ*, then the intersection (*OX*) ⋅ *ρ* is empty (at least in the usual realm of Euclidean geometry). Fortunately for artists, this situation does not seem to arise often; if an artist wanted to draw her feet (which presumably are directly below her eye), she would tilt the picture plane rather than leaving the canvas vertical. A much more frequent artistic conundrum is that sometimes the image *X′* appears to exist even though the object *X* does not: this situation arises in the case of the well-known *vanishing point*. The vanishing point where the two railroad tracks appear to meet together on the horizon plays an extremely important role in a perspective picture, even though there is no such point in the real world.

### Ideal Points

To counteract both of the above difficulties with single solution, mathematicians expanded the notion of Euclidean space to a larger space; if we use analytic properties such as coordinates in this space, we call it “projective space” (\(P^3({\mathbb {R}})\)), or if we use purely geometric properties, we call it “Extended Euclidean space” (\(\mathbb {E}^3\)). This larger space includes not only all the familiar points in \(\mathbb {R}^3\), but also an additional set of points called *ideal points* (or sometimes *points at infinity*). In the spaces \(P^3({\mathbb {R}})\) and \(\mathbb {E}^3\), we must alter our conception of parallel lines; in particular, lines in \(\mathbb {R}^3\) that are parallel meet in \(\mathbb {E}^3\) at an ideal point. We will delve further into \(P^3({\mathbb {R}})\) in section “Homogeneous Coordinates”; until then, this text will only need the geometric properties of \(\mathbb {E}^3\).

Ideal points are created by what we call a *formal* definition, meaning that the definition itself “forms” the object. This kind of definition is different than one that merely identifies an existing object: we could define \(\sqrt {2}\) to be “the positive real number *x* with the property that *x*^{2} = 2.” The definition of \(\sqrt {2}\) is not a *formal* definition, because such a number already exists in \(\mathbb {R}\). But the definition of ideal points creates something new, in the same way that defining the imaginary number *i* to be “a number *z* with the property that *z*^{2} = −1” creates something that does not exist in \(\mathbb {R}\), leading to the formation of the complex plane \(\mathbb {C}\). In the same way, the space \(\mathbb {E}^3\) is larger than and has different properties from \(\mathbb {R}^3\).

In particular, in \(\mathbb {E}^3\), every line and plane intersect in a point (unless the line is a subset of the plane, in which case their intersection is a line). This means that if the center *O* is not a subset of the image plane *ρ* and if *O*≠*X*, the image point *X′* = (*OX*) ⋅ *ρ* is always well defined.

Similarly, two lines in \(\mathbb {E}^3\) are coplanar if and only if they intersect in exactly one point. In this sense, as we noted above, “parallel” lines are coplanar and intersect in an ideal point. Artistically speaking, the existence of ideal points as the intersection of parallel lines allows us to say that if *X′* is a vanishing point in our picture, then the object *X* that it portrays exists and is a point “at infinity.”

Because vanishing points play such a crucial role in understanding perspective pictures, it is worth looking at these objects more carefully.

### Vanishing Points

*of some child or group of children*, a “vanishing point” is always a vanishing point

*of some line or collection of lines*. An examination of Fig. 3 shows that the line

*ℓ*appears to vanish when the artist at point

*O*is looking parallel to

*ℓ*; it follows that a point

*V*∈

*ρ*is the vanishing point for the line

*ℓ*if and only if

*V*is the image of the ideal point (the point at infinity) on

*ℓ*.

It follows that if several lines *ℓ*_{1}, *ℓ*_{2}, *ℓ*_{3}, … are parallel to one another, then the line *OV* is parallel to all of them if and only if *OV* is parallel to any one of them, so the lines *ℓ*_{1}, *ℓ*_{2}, *ℓ*_{3}, etc. all have the same vanishing point. If the lines *ℓ*_{1}, *ℓ*_{2}, *ℓ*_{3}, etc. are parallel to one another but not parallel to the picture plane, it follows that *V* is a real (rather than ideal) point, so their images \(\ell _1^{\prime }, \ell _2^{\prime }, \ell _3^{\prime }, \) etc. are not parallel but rather all intersect at that point *V* (giving us, e.g., the drawing of the railroad tracks that converge at a point in the horizon). If the lines are parallel to one another and also parallel to the picture plane, then *OV* is likewise parallel to the picture plane, implying *V* is an ideal point, and so the images \(\ell _1^{\prime }, \ell _2^{\prime }, \ell _3^{\prime }, \) etc. will all be parallel to each other (as well as to the original lines).

Note that this definition of vanishing point implies something significant about interpreting a piece of art. If we know something about a set of lines (say, we can infer that the lines in the road were running perpendicularly to the canvas), and we can locate the vanishing points of those lines on the canvas, then this means we know something about the location *O* of the artist, and this location is something we explore further in the next section.

## Where Was the Camera?

In the previous section, we claimed that the location of vanishing points helps us determine the location of the artist or camera that made the picture. In this section, we explore the implications of this claim. Determining the location of an artist or of a camera is the source of a good amount of mathematical inquiry (see, e.g., references Byers and Henle 2004, Crannell 2006, Futamura and Lehr 2017, Robin 1978, and Tripp 1987). Moreover, the methods for solving this question lead to multiple applications, as we will see in the sections that follow (See also “Geometries of Light and Shadows from Piero della Francesca to James Turrell”).

*camera obscura*to project images onto canvas where he would capture them in paint. As such, his works give us excellent examples of perspective projections.

*V*in the second floor of the building, near the main doorway and below the clock in the painting.

Because this third set of real-world lines are perpendicular to the canvas, it follows that Canaletto was perpendicularly across from the point *V* depicted in the picture – in other words, he was not standing on the ground, but was stationed on the second floor of another building.

*D*directly above

*V*(see Figs. 5 and 6). Because the slope of the real-world line is 1/2, the geometry of similar triangles allows us to deduce that the viewing distance (the distance from Canaletto to the canvas) is twice the length of the segment \(\overline {VD}\). Assuming the clock tower to be approximately 70 ft tall (based on its height relative to the people in the picture), we get that the height of the clock tower appears to be 35% the length of \(\| \overline {VD} \|\), so Canaletto was approximately 200 ft from the clock tower.

In conclusion, a few standard assumptions about Canaletto’s world (buildings were constructed with right angles, the arches were semicircles, and people were approximately the same height they are today) allow us to reconstruct the location of that artist as he painted this picture 300 years ago.

## A Consequence of Viewing Distances: Illusion, Distortion, and Anamorphism

Understanding where the artist stood is more than a historical exercise; it also has the power to affect how we view photographs and the apparent distortion within them. Almost every person has had the experience of seeing a breathtaking vista and trying to capture it on camera, only later to lament that the photograph didn’t do justice to the power of the original view. Often, the problem is not with the mechanics of the photograph or the photographer, but with the small size of the image coupled with the too-far distance of the person looking at the photograph. If the photograph were larger, or if its viewer were closer, the sense of awe for the vista might return.

^{∘}, even though this vertex is supposed to represent a right-angled corner. We could have made the corner appear more like a right angle by placing the vanishing points further apart. But surprisingly, we can also make the corner appear more like a right angle by moving

*ourselves*closer to the drawing. If a viewer moves uncomfortably close to this picture (in particular, if a person looks with one eye from a location very close to the × on the horizon), the angles in the word appear to be correct, 90

^{∘}angles.

*O*

_{1}is far from the image – just as most readers of this chapter will view “LIFE” in Fig. 7 from a comfortable distance. The lines of sight to the two vanishing points for the viewer at

*O*

_{1}form an acute angle

*θ*. Recall that when an artist draws a scene through a window, the vanishing points in the picture plane will lie on those lines of sight that are parallel to the lines she is drawing in the “real world.” Therefore, for the viewer at

*O*

_{1}, the drawing appears to depict an object that is likewise formed by the acute angle

*θ*.

On the other hand, the viewer at *O*_{2} is closer to the picture plane, at a place where the lines of sight from *O*_{2} to the vanishing points are perpendicular. Therefore, for this viewer, the drawing appears to depict an object in the real world formed by lines that are likewise perpendicular to one another. In other words, if the drawing is supposed to depict an object with right angles, the closer viewer sees an “undistorted” picture, whereas the further viewer sees a distorted image.

The reason our photographs don’t capture what we remember seeing is not because the camera messed up; it is because we view the small photographs from too far away. Enlarging the photos or moving closer to the photos will restore the illusion of depth.

### Dolly Zoom

*dolly zoom*. (The

*dolly zoom*has many other names – including the

*Hitchcock zoom*, because it first appeared in that director’s film

*Vertigo*when it was pioneered by cameraman Irmin Roberts.) In this zoom, the camera is placed along a track and pulled backward while simultaneously zooming in on the figure in the foreground.

If the camera pulls back slowly (as in a diner scene in *Goodfellas*), the psychological effect is one of creeping unease. The audience is aware of something being not quite right, but can’t quite place the source of trouble. Often, however, the camera zooms back quickly: in *The Lord of the Rings: Fellowship of the Rings*, as Frodo stands on a road, the accompanying dolly zooms last a fraction of a second, evoking a feeling of terror. It’s no surprise, then, that Michael Jackson’s *Thriller* video ends with a similar, speedy zoom! These sudden zooms are technically difficult and costly, but clearly they are worth the expense and effort to the directors of these films. See Boing Boing (2015), for example, for a video clip purporting to be “23 of the best dolly zooms in cinematic history.”

Of course, the effect can be reversed (even with a virtual camera); at the end of Fiona’s battle with Robin Hood’s men in the animated movie *Shrek*, there is a split-second reverse dolly zoom, giving the sudden impression that the battle is over and all is right with the world.

### Anamorphic Art

The word “LIFE” in Fig. 7 looks moderately distorted because of the unusually close viewing distance, but the word is still recognizable because the viewing target (at the “×”) is centered on the horizon. That is, if we hold this picture in front of us, we’ll be centered on the viewing target; the distortion comes solely from the distance between our eye and that target.

*anamorphism*, appears in a 1533 painting by the German-born artist Hans Holbein the Younger.

*The Ambassadors*(Fig. 11) appears to show a wealthy landowner and a Bishop surrounded by objects both secular and religious. Toward the bottom of the painting is an odd gray-and-black smear; this smear is in fact meant to be viewed from the extreme right edge of the painting. A viewer standing at this extreme angle would not be able to see the men and their possessions clearly, but would clearly be able to see a skull hidden in plain view within the painting (Fig. 12).

Anamorphic art is hardly confined to the sixteenth century; it abounds today in curated museum shows, in public spheres (e.g., in the New York subway system), and in art-gone-viral (just perform an Internet search for the sidewalk chalk artist, Julian Beever, sidewalk art (Beever 2019)). See “Anamorphosis: Between Perspective and Catopritics” for a fuller treatment of the topic. Anamorphism has its practical aspects, too: turn arrows painted on roadways look highly distorted when seen from directly above but appear correct to the drivers approaching along the road. There are parking garages that paint anamorphic exit signs, which make sense to the cars needing to leave the building but appear to be a jumble otherwise.

### Impossible Figures

*Penrose triangle*(Fig. 13), popularized in the 1950s by the father-son team of Lionel and Sir Roger Penrose, a psychologist and mathematician.

This triangle is one of the simplest and most iconic examples of what we call “impossible figures.” Locally, at each corner of the object, this appears to be the image of a solid three-dimensional object made of flat surfaces with linear edges. But the object as a whole contradicts the local analysis. For example, as we travel around the object counter-clockwise, each subsequent corner appears to be closer to the viewer than the previous one – an impossibility in a closed loop!

Many artists include “impossible figures” in their work, including Swedish artist Oscar Reutersvard – who is credited with the 1930s discovery of the triangle that would later bear the Penrose name – and M.C. Escher, whose *Waterfall*, *Ascending and Descending*, *Belvedere* (among many others) have captivated and perplexed generations of curious viewers.

The observation that the same object (such as the Penrose Triangle sculpture above) can have very different appearances when viewed from two different locations is one of the reasons that reconstruction of three-dimensional objects from two-dimensional photographs is such a challenging one. This challenge is the focus of the next section.

## Going Backward from Pictures to 3D

In the centuries that saw Desargues, Canaletto, and Poncelet, the task of drawing accurate images and maps was a significant technological challenge. But in today’s world – where cameras are built into cell phones – accurate images surround us. The ubiquity of digital images has allowed us to attempt new technological challenges of our day: to recreate a three-dimensional world from a collection of photographs.

Entering the search term “Rome” on Flickr returns more than two million photographs. This collection represents an increasingly complete photographic record of the city, capturing every popular site, facade, interior, fountain, sculpture, painting, cafe, and so forth. It also offers us an unprecedented opportunity to richly capture, explore and study the three dimensional shape of the city.

We are all familiar with computer games that allow us to move through a virtual 3D world, and also with online sites (such as Google Maps) that allow us to virtually “move” through city streets while seated at our computers. These newly familiar experiences rely on already knowing the structure of space. Virtual gaming worlds have a three-dimensional structure already encoded into the software; Google maps takes images from satellite or roving, calibrated cameras with GPS coordinates encoded into the image.

What makes the work of Agarwal (etc.) a geometrical challenge is the almost complete absence of a priori geographic or spatial information. Piecing the world back together from a collection of random of photographs is like fitting together a jigsaw puzzle with a million pieces, some of which are missing and many of which are redundant. (Almost no one takes pictures of the dumpster behind the grocery store; millions of people take photographs of a famous statue.)

- 1.
identifying feature points or lines that match across images;

- 2.
doing a reconstruction from pairs or possibly triplets of images; and

- 3.
piecing together and refine these many reconstructions using optimization.

The first step requires careful use of a cluster of computers, one of which is designated as the “master node” that distributes images to individual computers (nodes) in a balanced manner. The nodes each toil away at pre-processing images by verifying they are readable and extracting available camera information (if any is attached). The process of matching images is not entirely random; in the same way most people begin solving a jigsaw puzzle by looking for edge pieces, the matching algorithm uses a library of SIFT (Scale Invariant Feature Transform) features.

Likewise, the third (final) step uses intense use of computational algorithms, outside the scope of this chapter.

Step two is where projective geometry comes in; this step requires “undoing” the kind of perspective map that Desargues and Caneletto mastered long ago. This step is the basis for the field of *multiple view geometry*, an increasingly fertile area of research for theoretical and applied mathematicians alike. Indeed, the author was introduced to this subject at an energetic week-long gathering of university professors and Google engineers at a conference on *Algebraic Vision* hosted by the American Institute of Mathematics in summer 2016.

To describe the locations of real-world points and their photographic images in a way that is amenable to computer algorithms, we will need to understand *homogeneous coordinates* for space; that is the subject of the next section.

## Homogeneous Coordinates

To motivate the use of homogeneous coordinates(as contrasted with Cartesian coordinates), we return to the notion of an observer positioned at the origin \(\textbf {0} = (0,0,0)\in \mathbb {R}^3\), gazing at the world through a picture plane. To this observer, every point along a given line of sight will map to the same point on the picture plane. In particular, points (*x*, *y*, *z*) and (*λx*, *λy*, *λz*) have the same image whenever *λ*≠0. If the picture plane is *z* = 1, for example, then both of these points map to \(\displaystyle \left ( \frac xz, \frac yz, 1 \right )\).

*projective plane*, as equivalence classes of points in \(\mathbb {R}^3\setminus \{ \textbf {0} \}\). A point in \(P^2({\mathbb {R}})\) can be written in homogeneous coordinates in the form \(\left [ x : y : z \right ]^T\) for \((x,y,z)\in \mathbb {R}^3\setminus \{\textbf {0} \}\); we say

*λ*≠0. Just as points in \(P^2({\mathbb {R}})\) correspond to real lines through the origin; lines in \(P^2({\mathbb {R}})\) correspond to real planes through the origin. Said another way, projective points \(\left [ {x_1} : {y_1} : {z_1} \right ]^T\), \(\left [ {x_2} : {y_2} : {z_2} \right ]^T \), and \(\left [ {x_3} : {y_3} : {z_3} \right ]^T \) are collinear in \(P^2({\mathbb {R}})\) precisely when real points (

*x*

_{1},

*y*

_{1},

*z*

_{1}), (

*x*

_{2},

*y*

_{2},

*z*

_{2}), and (

*x*

_{3},

*y*

_{3},

*z*

_{3}) are coplanar in \(\mathbb {R}^3\).

The projective plane \(P^2({\mathbb {R}})\) and the Extended Euclidean plane \(\mathbb {E}^2\) (see section “A New Mathematical Object: The Point of Projective Geometry”) have a natural correspondence. If we think of \(\mathbb {E}^2\) as the extension of the particular plane *z* = 1, then we can identify the projective point \(\left [ a : b : c \right ]^T \) with the ordinary point \((\frac ac, \frac bc, 1)\) whenever *c*≠0; projective points of the form \(\left [ a : b : 0 \right ]^T \) correspond to ideal points in \(\mathbb {E}^2\). This makes some intuitive sense, as these correspond to the observer’s lines of sight that are parallel to the picture plane, and so “intersect” the plane *z* = 1 “at infinity”.

*λ*≠0. As before, we can find a natural correspondence between \(P^3({\mathbb {R}})\) and \(\mathbb {E}^3\) (say, via the identification using

*w*= 1). These homogeneous coordinates underlie much of the field of

*analytical projective geometry*.

*z*=

*d*,

*w*= 0, which we think of as an embedding of \(P^2({\mathbb {R}})\subset P^3({\mathbb {R}})\). To such an observer, the point \(\left [ x : y : z : w \right ]^T \) would have an image on the window located at

*camera*to be a 3 × 4 matrix. Moving the viewer, shifting the film, rotating the image plane, or using a camera with non-square pixels has the effect of changing the entries of the camera matrix

*P*. (See (Hartley and Zisserman, 2003, Chapter 6) for a fuller description.)

## Multiple View Geometry

How do we recover information about a three-dimensional world from two-dimensional images?

Suppose we have two images of the same real-world object. Usually, one of the first steps in reconstruction of the 3D scene is to determine what is called the *fundamental mapping* taking points in the first image *α* to a certain set of lines in the second image *β*. The description below explains how and why this mapping emerges.

We say points *x*_{α} ∈ *α* and *x*_{β} ∈ *β* are *corresponding points* if they are images via the appropriate maps of a common point \(X\in P^3({\mathbb {R}})\). That is, *X* projects onto *x*_{α} ∈ *α* from the point *O*_{α}, and *X* projects onto *x*_{β} ∈ *β* from the point *O*_{β}. Then the five points *X*, *x*_{α}, *O*_{α}, *x*_{β}, and *O*_{β} are necessarily coplanar. Note that the line (*O*_{α}*O*_{β}) – called the *epipolar line* – lies in every such plane constructed from corresponding points. Of particular interest along this line are the *epipolar points* *e*_{α} = *α* ⋅ (*O*_{α}*O*_{β}) and *e*_{β} = *β* ⋅ (*O*_{α}*O*_{β}). We can think of *e*_{α} as the image in *α* of the camera at *O*_{β}, and *e*_{β} as the image in *β* of the camera at *O*_{α}.

*x*

_{α}might correspond to several different points in the plane

*β*. For example, the camera at

*O*

_{α}might appear to show a tree growing out of a person’s head: the point

*x*

_{α}could come from both the person’s hat and the trunk of the tree. The images of the hat and trunk in another photograph

*β*might not coincide with each other, but because of the coplanar relationship described in the previous paragraph and illustrated in Fig. 17, they must be collinear with the epipolar point

*e*

_{β}. Accordingly, a pair of photographs of the same scene, taken from two different camera locations, describe a function from points

*x*

_{α}in

*α*to lines (

*e*

_{β}

*x*

_{β}) in

*β*. This function is called the

*fundamental mapping*.

*x*

_{α}and

*x*

_{β}can be thought of as points in \(P^2({\mathbb {R}})\), we can represent the fundamental mapping with a 3 × 3 matrix

*F*, called the

*fundamental matrix*. In general, we can determine

*F*from 7 pairs of corresponding points in general position (the matrix is a rank-2 matrix and therefore has 7 degrees of freedom). For each of these corresponding pairs of points, the mapping satisfies

*Fx*

_{α}= (

*e*

_{β}

*x*

_{β}); that is to say,

*α*and

*β*, then the fundamental matrix allows us to describe a relationship between the two cameras

*P*

_{α}and

*P*

_{β}which created the two images. Why is this? For any point \(X\in P^3({\mathbb {R}})\), we have

How, then, do we use the fundamental matrix to reconstruct the real-world scene? The answer is not simple, as the figure of the Ames room below shows.

### The Ames Room

Said another way, the Ames roomis *projectively equivalent* to a normal room; there is a *collineation* \(P^3({\mathbb {R}})\to P^3({\mathbb {R}})\) (a function that takes points to points and lines to lines) that maps the Ames room onto a normal, rectangular room. For this reason, the methods described above can determine the relationship between two cameras – and thereby the reconstruction of the three-dimensional scene – only up to projective equivalence. The fundamental mapping by itself can help us distinguish between an Ames room and an A-frame house, but it can’t tell an Ames room from a regular rectangular room. We can’t extract distance or angle measurements of real-world objects without a priori information about the scene or the cameras.

### Reconstructing Objects from Images

Knowing real-world information vastly increases the ease with which we can reconstruct objects from images. A “calibrated camera” makes the reconstruction process much simpler. For instance, many modern digital cameras come available with GPS information encoded into the image. For even more accuracy, many 3D scanners use a known camera that is a fixed distance from a turntable rotating at known angles. Knowing the focal length of the camera allows us to account for phenomena such as the dolly zoom; knowing the viewing target allows us to account for anamorphic effects (see section “A Consequence of Viewing Distances: Illusion, Distortion, and Anamorphism”).

Real-world information is useful as well. Note that in analyzing Canaletto’s painting in section “Where Was the Camera?,” we used standard observations about real-world parallel lines, and also about real-world perpendicular lines, to gain information about Canaletto’s viewing position. In general (meaning, if the scene is not an Ames-room-like scene), this kind of assumption means that reconstructing scenes with architectural features is simpler than, say, reconstructing landscapes. We can see the importance of knowing such geometric information for understanding drawings like “LIFE” (Fig. 7) or the Penrose sculpture (Fig. 13).

In the analysis of Canaletto’s painting, we also used information about proportions (by assuming the arch was a semicircle) and about actual size (e.g., the heights of the people pictured). This kind of detective work is another part of reconstruction; without it, we can’t distinguish between photographs of, for example, a single-family home and a doll’s house.

In practice, the task of reconstruction is further complicated by “noise” and error: points are infinitesimal, but pixels are discrete and finite. So optimization and error analysis also enter into reconstruction algorithms.

Nonetheless, at the heart of any reconstruction lies the language of homogeneous coordinates and analytical projective geometry.

## Conclusion

The long and storied history of projective geometry weaves itself through the last half-millennium of mathematics; it is a subject that has been discovered and rediscovered by mathematicians searching for answers beyond Euclidean geometry. Its reemergence under Poncelet points to the aesthetic elegance of its axiomatic structure; the subject has also led to deeper understandings of conics (e.g., under the influence of Steiner) and of topology (e.g., under Möbius).

But its utility in perspective drawings and photographs is where the subject of projective geometry becomes most applied and touches our lived experiences most directly. With the passing of time, this tool is becoming even more relevant and powerful than when Desargues first introduced it. We live in a world that is increasingly visual, a world in which technology creates, reproduces, and alters images constantly; analytic projective geometry is the machinery that allows us to create, explain, and analyze these digitized images.

Beyond the technical aspect of analyzing digital images, constructive projective geometry gives us all a way to see our surroundings and the objects in them: to better understand how to look at paintings or our vacation photographs, to create or to dispel illusions, and to interpret the way we look at our wonderful, three-dimensional world.

## Cross-References

## References

- Agarwal S, Furukawa Y, Snavely N, Simon I, Curless B, Seitz SM, Szeliski R (2011) Building Rome in a day. Commun ACM 54(10):105–112. With a Technical Perspective by Prof. Carlo TomasiCrossRefGoogle Scholar
- Andersen K (2006) The geometry of art: the history of the mathematical theory of perspective from Alberti to Monge. Springer, New YorkGoogle Scholar
- Andrevruas (2011) Português: Casa construída de forma a fazer a pessoa parecer grande ou pequena dependendo da perspectiva, na cidade do Rio de Janeiro, 24 Jan 2011. https://commons.wikimedia.org/wiki/File:Casaperspectiva.jpg, from Wikimedia Commons
- Beever J (2019) Julian Beever’s website. http://www.julianbeever.net/
- Boing Boing (2015) Watch 23 of the best dolly zooms in cinematic history, 26 Jan 2015. https://boingboing.net/2015/01/26/watch-23-of-the-best-dolly-zoo.html Google Scholar
- Bosse A (1648) Manière universelle de Mr. Desargues, pour pratiquer la perspective par petit-pied, comme le Geometral, ParisGoogle Scholar
- Byers K, Henle J (2004) Where the camera was. Math Mag 77:4:251–259MathSciNetCrossRefGoogle Scholar
- Canaletto GA (circa 1730) The Clock Tower in the Piazza San Marco. https://commons.wikimedia.org, oil on canvas, 69.22 × 86.36 cm, current location at the Nelson-Atkins Museum of Art
- Carroll L (1871) Through the looking-glass. Macmillan & Co, LondonGoogle Scholar
- Crannell A (2006) Where the camera was, take two. Math Mag 79:4:306–308CrossRefGoogle Scholar
- Desargues G (1987) Exemple de l’une des manieres universelles du s.g.d.l. touchant la pratique de la perspective sans emploier aucun tiers point, de distance ny d’autre nature, qui soit hors du champ de l’ouvrage. In: The geometrical work of Girard Desargues. Springer, New York, p 1636Google Scholar
- Deutsches-Technikmuseum (2008) Penrose triangle sculpture. https://commons.wikimedia.org/w/index.php?curid=3597501, images from Wikimedia Commons
- Futamura F, Lehr R (2017) A new perspective on finding the viewpoint. Math Mag 90(4):267–277MathSciNetCrossRefGoogle Scholar
- Hartley R, Zisserman A (2003) Multiple view geometry in computer vision, 2nd edn. Cambridge University Press, New YorkzbMATHGoogle Scholar
- Holbein H (1533) The Ambassadors. https://commons.wikimedia.org, oil on oak, 209.5 cm× 207 cm
- Robin AC (1978) Photomeasurement. Math Gaz 62:77–85CrossRefGoogle Scholar
- Taylor B (1719) New principles of linear perspective: or the art of designing on a plane the representations of all sorts of objects, in a more general and simple method than has been done before, LondonGoogle Scholar
- Tripp C (1987) Where is the camera? The use of a theorem in projective geometry to find from a photograph the location of a camera. Math Gaz 71:8–14CrossRefGoogle Scholar