Keywords

Introduction

Optics is one of the rare gems of physics where some principles and ideas developed in the ancient times, from Euclid to al-Haytham, still sustain part of the current models of how light behaves [1,2,3,4]. These ideas made possible to build simple optical instruments such as glasses and telescopes [5,6,7]. A few 100 years ago, the advances and ideas presented by a collection of great experimentalists, such as Grimaldi, Young, Fresnel, and Arago, prevailed against Newton’s supporters to build the concept of light as a wave. This image of light as a wave was fully understood by Maxwell, who is recognized as the father of electromagnetism. A few decades later, the beginning of the twentieth century saw light as a flux of particles, as Newton postulated, but now is escorted by a sound robust model in the form of the quantum theories of light and flanked by the brilliant minds of Planck, Einstein, and so many others. Actually, all these models are well-recognized in colleges and universities because they predict how light interacts with matter and with itself. Even more, geometrical, electromagnetic, and quantum optics live together in peaceful harmony. Every optics and Photonics book contains these three models [8,9,10,11,12].

Within the scope of this chapter, we will mostly remain within the comfort zone of geometrical optics. When necessary, we will jump to wave optics to understand better those notions about wavefront aberrations and how they describe the deviation from the perfect object-image representation. Every optical system designed to generate an image from a given object requires interfaces and materials where light behaves differently. These image-forming systems collect the light coming from the object and deliver it to the detection area, where it is registered by a variety of mechanisms—from the chemical reactions caused in photographic films to the bio-chemical response given by the specialized light-sensitive cells in the retina. In the very first approach, this process can be described by considering how light travels along geometrical paths or light rays. These rays are bent by the optical system to finally reach their destination such that, ideally, every ray departing from a point in the object arrives at a single corresponding point in the image. One of the key elements in electromagnetic optics is the definition of the electromagnetic spectrum, where light is modeled as an electromagnetic wave. The electromagnetic spectrum classifies electromagnetic waves in terms of their wavelength, λ, given as the spatial distance between equivalent oscillatory states, and frequency, ν, related to the temporal rhythm of oscillation. The relation between them is λ = v/ν, where v is the speed of propagation of the electromagnetic wave. This means that a shorter wavelength corresponds to a higher frequency. At this point, it is interesting to note that, within the quantum model of light, an electromagnetic wave having a frequency ν can be represented by a collection of photons. Each photon carries a tiny amount of energy that obeys the Planck’s relation: E = , where h is the Planck’s constant (h = 6.6261 × 10−34 m2 kg/s), meaning that the higher the frequency, the higher the energy is carried by each associated photon. The visible optical range covers the values of λ ∈ (380, 780) nm, where the lower limit corresponds to the violet color and the upper limit to red. In between them, we have the spectral chromatic gamut seen in the rainbow. The visible wavelengths correspond to frequency values in hundreds of terahertz (′1014 Hz). The visible range is limited by the ultraviolet (λ ∈ [100, 380] nm) and the infrared (λ ∈ [0.78, 100] μm) ranges.

As a final comment in this Introduction section, we wonder how exact is exact? The correct definitions of approximations, boundary conditions, and limitations are deeply woven into the fabric of Physics, and therefore into optics too. Guided by the scientific method, Physics has developed models and theories to understand how nature behaves. The scientific method is continuously challenging the current theories to find cracks and exceptions in order to build a more complete model that, one more time, requires scrutiny and discussion from scientists. So, an exact quantity is always accompanied by an error bar (and most of the times, even error bars are affected by uncertainties). Our purpose here is to present how optics helps to build a clearer picture of what light is and how optical image-forming systems behave. The certainty of the models should be confronted with the needs in accuracy of the given application. We will also peek at what lies beyond a given approximation—only with the necessary math and formalisms—to improve the understanding of optics and image-forming systems.

Optical Materials and Geometry

The first and the simplest approach to optics is made using geometry. Here, light travels along spatial trajectories known as light rays, and the problem is how to use these rays to describe the image-forming capabilities of optical systems. Very little attention is paid to the energy carried by light and some other important characteristics, such as wavelength and polarization, unless they actually impact the trajectories of the light ray. Geometry also requires some help from materials physics when defining optical parameters, such as the index of refraction or the Abbe number, and also borrows the wavelength concept from electromagnetism to explain the chromatic behavior of optical system. In any case, geometry governs the propagation of light in such a manner that it becomes the first approach to any optical analysis to obtain the location and characteristics of an image given an optical system.

The Index of Refraction

When considering image-forming systems, the materials used to build optical instruments should be as transparent as possible to minimize the amount of energy lost along the light trajectories. Still, they interact with light in a more subtle manner, modifying the speed of light within them. We all know that light travels at the highest possible velocity, c = 299,792,458 m/s, when propagating in vacuum. However, when passing through transparent media, light slows down significantly. Actually, one of the optical parameters that defines the light–matter interaction is the ratio between c and the speed of light in the material, v, that is well-known as the index of refraction:

$$ n=\frac{c}{v}. $$
(4.1)

Every optical material is characterized by its index of refraction. The lowest possible value is n = 1 that corresponds to the case of vacuum when v = c. The value of n depends on the composition of the material. For example, because of its low density, gases (including air) have an index of refraction very close to 1. The index of refraction of water is nwater′ = 1.333, and most of the optical glasses are in the range n ∈ (1.4, 1.8). Moreover, the index of refraction is wavelength-dependent: n = n(λ). This means that different spectral colors will behave differently when propagating through optical media. To parameterize this dependence, we define another important variable, the Abbe number, that is given as:

$$ V=\frac{n_{\mathrm{d}}-1}{n_{\mathrm{F}}-{n}_{\mathrm{C}}}V=\frac{n_{\mathrm{d}}-1}{n_{\mathrm{F}}-{n}_{\mathrm{C}}}, $$
(4.2)

where nd is the index of refraction for a wavelength, λd = 587.6 nm, close to the location where the human eye is more sensitive (λ = 555 nm), and nF and nC are the index of refraction for two wavelengths (λF = 486.1 nm, λC = 656.3 nm) located in the blue and red regions of the visible spectrum, respectively. The Abbe number helps to understand how large the change of the index of refraction is with respect to the wavelength. Then, by providing the index of refraction and the Abbe number of a material, we have quite a good idea of how an optical material behaves in the visible spectrum. Actually, the human lens is not an exception to this and presents a value of the Abbe number that varies between Vlens,min = 45.6 and Vlens,max = 47.3, corresponding to the low and high index of refraction of the human lens, nlens ∈ (1.386, 1.406), respectively [13].

If the material is not transparent, the index of refraction becomes an imaginary number, ˜n = n − ik. The value k in its imaginary part describes the absorption of light that is produced along the propagation. Also, this absorption is a function of λ, giving rise to colored filters and some other very interesting mechanism of interaction. For example, the human lens shows a very large absorption coefficient in the ultraviolet region, meaning that a tiny portion of the UV light reaches the retina. This fact also means that this portion of the optical spectrum is strongly absorbed by the cornea and lens where it can produce some other unwanted effects.

The index of refraction is of paramount importance when describing how the straight trajectories observed for homogeneous media bend when passing from one material to another (see Fig. 4.1a). This behavior is well-described by Snell’s law:

$$ n\kern0.24em \sin \kern0.24em \epsilon ={n}^{\prime}\sin {\epsilon}^{\prime }, $$
(4.3)

where n and n’ are the index of refraction of the involved materials on both sides of the interface, and ε and ε′ are the incidence and refraction angles, respectively. In Fig. 4.1b, we can see how this bending, or angular deviation, is given as δ = ϵ − ϵ, works in an optical prism.

Fig. 4.1
Three panels illustrating optical physics concepts. Panel A, presents light reflection and refraction at a surface, panel b demonstrates light paths through a prism, and panel c graphs the relationship between the angle of incidence and light transmittance and reflectance.

(a) A graphical arrangement of Snell’s law (Eq. 4.3), where we represent the incident, the reflected, and the transmitted rays. The vectors i, o, and k are those included in the three-dimensional form of Snell’s law (Eq. 4.5). (b) An example of the application of Snell’s law to the angular deviation of a prism. (c) Power budget between the transmitted and reflected beams, represented through transmittance (T) and reflectance (R) as a function of the angle of incidence, ε. This calculation assumes that the incidence has a natural polarization state. The values at ε = 0° are given in Eq. (4.4) and corresponds to a case where n = 1 and n’ = 1.5

Also, when considering the amount of light (the power budget) that goes through a given separation between materials, the index of refraction appears in the equations and describes how much energy is reflected and how much is transmitted by the interface (see Fig. 4.1c). These relations are known as Fresnel equations, which take quite a simple form in the case of normal incidence (ϵ = ϵ = 0)

$$ R={\left(\frac{n-{n}^{\prime }}{n+{n}^{\prime }}\right)}^2,\kern3.2em T=\frac{4n{n}^{\prime }}{{\left(n+{n}^{\prime}\right)}^2}, $$
(4.4)

where R and T are the reflectivity and transmissivity of the interface, respectively. If we calculate the numbers for an interface between air, nair = 1, and the corneal tissue, ncornea = 1.376, we find that R = 0.975 (97.5% of the energy enters the cornea) and T = 0.025 (2.5% of the incident light is reflected).

We cannot finish this description of the index of refraction without paying spatial attention to non-homogeneous materials. This is the case of the lens of the human eye. It is well-known that the lens is better described as a graded index of a refraction element [14,15,16]. The index of refraction at the core of the lens is the largest and decreases when moving toward the surface. This change is smooth over a limited range, but it bends the light trajectory in quite an efficient way. Therefore, when replacing the human lens by a single-material intra-ocular lens, we are also replacing a biologically graded index material by a polymer having a constant index of refraction over its whole volume. In the case of gradient media, the light trajectories do not follow a straight line, as it happens with homogeneous media. The actual propagation of light, within the geometrical model that only considers the trajectory of light, is given as the solution of a mathematical variational problem where a quantity defined as the optical path reaches an extremal point (a maximum or a minimum) [17]. The optical path, L, is defined as the product of the geometrical trajectory, the propagated distance (d), times the index of refraction (n) of the material where light travels, L = nd. This is the same as saying that, for going from point A to point B, light follows a trajectory that requires the minimum possible time. This can be easily understood by remembering that the index of refraction is inversely proportional to the speed of light within the media, so the larger the index of refraction is, the slower the light propagates. Then, a continuous variation of the index of refraction also changes the speed of light continuously as it travels to different portions of the non-homogeneous material, and the time of arrival to a given point would change depending on the trajectory. This is where nature works and makes the light to spend the shortest time to arrive. All these previous concepts can be mathematically explained and derived in quite a safe way. Actually, there are some academic solutions, as the Luneburg lens, that is a sphere of an homogeneous material where the index of refraction increases when moving towards the center [18, 19]. In any case, graded index materials add a new parameter, the variation of the index of refraction, that can be used to improve the image-forming capabilities of an optical system.

Beyond Paraxial Optics

Why is paraxial optics so important? The reason is that it is robust, simple, and useful. Ray tracing, as a consequence of paraxial optics, makes it possible to understand how light travels from objects to images and how the objects and images can be real or virtual, larger or smaller, directed or inverted. Therefore, the location and size of the image can be easily obtained from quite a simple calculation or as back-of-the-envelope ray tracing [20, 21].

Besides, paraxial optics assures that optical system behaves perfectly. The conditions for an image-forming system to be perfect are defined as the three Maxwell’s conditions representing quite common sense capabilities for such systems. The first Maxwell’s condition states that the image of a point is a point, the second condition states that the image of a plane perpendicular to the optical axis of an optical system is also a plane, and the third condition states that the images are proportional to the objects.

Mathematically, the paraxial regime is based on an approximation for the trigonometric functions involved in the propagation of light: sin ϵ ≃  tan ϵ ≃ ϵ, and cos ϵ≃ 1 (where ϵ is given in radians, not in degrees). This means that Snell’s law has a paraxial counterpart as nϵ = nϵ. Therefore, paraxiality is lost when the involved angles (e.g., the incidence and refraction angles) are large enough to surpass the previous approximation and Snell’s law (Eq. 4.3) is strictly applied beyond its paraxial version.

Another useful simplification in the paraxial analysis of optical systems is to consider them as rotationally symmetric. This means that every plane containing the optical axis is equivalent, and any of them is valid to study the system. These planes are named as meridional planes. But this condition is broken easily and rays may have different behaviors for different meridional planes (e.g., astigmatic or toric lenses are not rotationally symmetric), or even more, they may travel as skew rays through the system. Although some paraxial calculations can be made for astigmatic lenses or systems, if we really need an accurate picture of how light travels through them, we have to use a three-dimensional representation of Snell’s law. In this general and more realistic case, Snell’s law becomes a slightly more complex relation (see Fig. 4.1a) that involves unitary vectors describing the incoming and outgoing rays (i and o, respectively) and another vector (k), the normal vector, that represents the orientation of the interface and points toward the media where the light is coming from [22]:

$$ n\left|\overrightarrow{i}\times \overrightarrow{k}\right|={n}^{\prime}\left|\overrightarrow{o}\times \overrightarrow{k}\right|, $$
(4.5)

where n and n0 are the index of refraction of the two materials separated by the interface, and × means a cross product. The modulus of these cross products are |i × k| = sinθ, and |o × k| = sinθ′, that retrieves Eq. (4.3) from Eq. (4.5). The geometrical layout of the involved vectors is shown in Fig. 4.1a. Fortunately, computers deal very well with these calculations and evaluate the propagation of millions of optical rays through an optical system in a reasonable time. These computational capabilities make possible the analysis, and the optimization, of image-forming systems, including the human eye.

Some characteristic optical parameters of optical systems, such as refracting power, power, or focal distance, are well-defined within the paraxial approach, and their meaning remains after surpassing the paraxial domain. Also, the paraxial formalism predicts the location and size of the optical image for a given object provided by an optical system. This is described through the main paraxial image-forming equations exemplified for a thin lens of focal f′ immersed in air as:

$$ -\frac{1}{a}+\frac{1}{a^{\prime }}=\frac{1}{f^{\prime }}, $$
(4.6)
$$ M=\frac{a^{\prime }}{a}, $$
(4.7)

where a and a′ are the object and image distances, respectively, and M is the lateral magnification defined as the ratio between the lateral size of the image, y′, and the object, y. These two paraxial equations serve as the first-order approximation to know where and how the image of an object is reproduced by an optical system. Equations (4.6) and (4.7) can be written in terms of vergences (V, V′) and refracting power (P) as −V + V′ = P and M = V/V′, respectively. The convention’s sign used here defines the distance of a real object as having a negative frontal distance, a < 0; meanwhile, the image of a real image has a positive sign. Vergences follow the same convention and are defined as V = n/a and V = n “/a,” where n and n′ are the index of refraction of the object and image spaces, respectively.

As we have seen, paraxial optics helps to grasp the main properties of an optical system. However, it fails when describing subtle details related to the quality of the image that is well beyond the paraxial approach. These discrepancies are also known as optical aberrations.

However, the paraxial approach is still valid when analyzing some aberrations related to the dependence of the index of refraction with wavelength, n = n(λ), in the so-called chromatic aberrations. To show this, we present the value of the focal length of a thin lens in air in terms of its material and geometrical parameters:

$$ \frac{1}{f^{\prime }}=\left(n-1\right)\left(\frac{1}{r_1}-\frac{1}{r_2}\right), $$
(4.8)

where r1 and r2 are the radii of curvature of the front and back surfaces, respectively, and n is the index of refraction of the material of the lens. Now, it is clear that if n varies with λ, then the focal distance, f′, changes too. This behavior is split into two: a variation in the location of the focal length (longitudinal chromatic aberration) and a variation in the intersection of rays corresponding to different wavelengths with the paraxial image plane defined for a given wavelength of reference (transversal optical aberration). Given its treatment, we could think of these chromatic aberrations as paraxial aberrations.

Seidel Aberrations

Once we know that optical aberrations describe the discrepancies in paraxial performance, they can be classified and described using several categories [23, 24]. The Seidell classification of aberrations is based on their geometrical meaning (see Fig. 4.2).

Fig. 4.2
Diagrams of optical aberrations. Spherical with light focusing at different points, Coma with asymmetric focal points, Astigmatism with different focal lines, Field Curvature with a curved focus, and Distortion comparing barrel and cushion types.

The five primary Seidel aberrations are presented in this figure. Spherical aberration considers all the rays passing through the aperture of the system. It is sometimes characterized by the longitudinal spherical aberration (LSA) and the transversal spherical aberration (TSA) that compares the impact of the marginal rays with the paraxial ones. Coma is produced when the rays enter the full aperture of the system for an off-axis object. Astigmatism generates the so-called Sturm’s conoid that contains two focal lines with a round spot in between them. Field curvature represents how the location of the image is no longer on a plane but it appears on a curved surface, also known as the Petzval surface. Both astigmatism and field curvature consider a narrow pencil of ray. Finally, the effect of distortion causes deformation of the location of the image point depending on its distance to the optical axis. In a real optical system, all these aberrations are mixed together

When applying the ray tracing rules to the case of real optical systems, it is possible to classify aberrations depending on the location of the object point (on axis or off axis), the aperture of the system, and the geometry of the optical system with respect to the incoming radiation. Seidel aberrations (spherical, coma, astigmatism, field curvature, and distortion) are depicted in Fig. 4.2 and described below.

To analyze these aberrations, we rely on their relation with the three Maxwell’s conditions of a perfect optical system. The first condition is related to the point-like property of the image for an object point source. This means that the optical rays departing from a point source, after propagating through the system, do not intersect at a single point but are distributed on the image plane as a finite size distribution of impacts. The aberrations violating the first Maxwell’s condition are spherical aberration, coma, and astigmatism. Actually, spherical aberration and coma can be seen as two different flavors of the same phenomena. They appear when considering every ray impinging on the entrance pupil of an optical system. The difference between them is that spherical aberrations consider the object point source located at the optical axis, meanwhile coma happens for objects placed at a given distance, or angular deviation, from the optical axis. The third aberration, astigmatism, has a deeper geometrical meaning. It occurs when a narrow pencil of rays strikes on a surface that shows two values of its radius of curvature along different planes. To better understand astigmatism, we first need to picture how a given surface may show two different radii of curvature, even for a spherical surface. A toric surface is quite a simple example. Let us consider the three-dimensional case of a donut. Every point on its surface has two curvatures aligned along the plane that we would use to split the donut in half. If the donut is sliced as a bagel sandwich, the radius of curvature is larger and the two sections have an “O” shape. When the donut is split to produce two “C” portions, the corresponding curvature at the cutting point has a smaller radius. Both radii of curvature are perpendicular to each other and can generate optical surfaces with different focusing characteristics. This fact is behind every toric lens prescribed to compensate the astigmatism ametropy. Moreover, this geometrical behavior also happens for oblique incidence on a spherical surface and generates oblique astigmatism. In any case, the two radii of curvature generate quite a unique three-dimensional structure known as Sturm’s conoid. This behavior produces two focalization planes where the image of the point source collapses as a segment, and an intermediate plane where the light spot takes the form of a circle (this spot is also known as the circle of confusion).

The second Maxwell’s condition establishes that the image of a plane perpendicular to the optical axis is another plane also perpendicular to the optical axis. The departure from this condition is explained as an aberration that is called field curvature. It describes how the image plane bends and departs from the paraxial image plane. The first approach to this aberration assumes that the image plane becomes a spherical surface that is tangent to the paraxial image plane at the optical axis. This surface where the image appears is known as the Petzval surface. This is quite disturbing for a lot of image-forming optical systems where the recording media is arranged on a flat surface (e.g., as a CMOS or CCD focal plane array). However, some optical systems, such as dome cinema projectors or the human eye, can locate the image on a curved surface. Therefore, in the case of the human eye, field curvature should be taken into account when considering the role of optical aberrations for extra-foveal perception. Also, ophthalmic lenses make use of field curvature when optimizing their performance taking into account the eye movement behind the lens [25, 26].

Finally, the third Maxwell’s condition assures that the image is similar to the object. This similarity should be taken in its strictest geometrical sense: the lateral dimensions are proportional, but the angular values are preserved. Distortion is the Seidel optical aberration that describes how the image is deformed with respect to the object, breaking the similarity condition between the object and the image. Mathematically, it means that the lateral magnification is not constant across the image plane, and the effect can be seen as a deformation of a rectangular grid that becomes closer to a pincushion or a barrel shape.

These previous descriptions have been developed to better understand the math behind the geometrical problem of image-forming system. Actually, they can provide simple geometrical relations applicable to the optimization of optical systems. However, Seidel aberrations never appear isolated and they are mixed together in real systems. Even more, when considering the chromatic behavior of optical systems, Seidel aberrations mix with chromatic aberrations to describe the behavior of optical systems working with white light [27].

Wavefront Aberrations

We have also explained how geometrical optics may help to understand the actual behavior of an optical system beyond the paraxial approach. Now, to complete the picture, we begin to move toward the electromagnetic model where light is a wave characterized by its wavelength, λ.

The propagation of light as a wave is better understood if we define and describe the optical wavefront. From an electromagnetic point of view, the wavefront is defined by those points sharing the same value of the phase of the propagating wave. This definition can be visualized by the evolution of the wavefronts when emitted from a point source (see Fig. 4.3). In the same way that the ripples of a pond surface caused by the impact of a stone propagate from the impact locations in circles, when moving to the three-dimensional domain, these circles become spheres, and the wavefront caused by a point source of electromagnetic waves travels at the speed of light in the medium, generating spherical wavefronts if the medium is homogeneous. This picture can be reinforced by assuming that upon departure from the point source, the light trajectories are accompanied by a time counter (a clock) that measures the travel time. Then, every point at the same wavefront shares the same time or the same optical path defined previously. The temporal period between ticks of this clock, T, is related to the frequency of the light, ν = 1/T, that is larger for the blue portion than for the red part of the visible spectrum. These spherical wavefronts are deformed after propagating through an interface, and this deformation depends on the change in the index of refraction and also on the geometry of the interface. From this explanation, we can see that a point-like object emits spherical wavefronts. If the optical system were perfect and a point object produced a point image, then the outgoing wavefront exiting the optical system would also be spherical with its center at the point image (see Fig. 4.3a). Unfortunately, this is not the case for real systems, and the wavefront after the optical system shows deformations with respect to the ideal spherical wavefront with its center at the paraxial image point. These discrepancies are described by the wavefront aberration (see Fig. 4.3b). As far as these discrepancies are defined after the optical system, it is customary to evaluate them the plane of its exit pupil.

Fig. 4.3
2 diagrams compare light propagation through optical systems. Panel a, a perfect optical system where rays and wavefronts perfectly converge on the image plane, while panel b depicts an aberrated optical system with a discrepancy between the reference and aberrated wavefronts.

A point source generates a collection of rays originated at the point-like object that, and collection of spherical concentric wavefronts. Rays and wavefronts are perpendicular to each other. In (a) we represent a perfect system, which transforms spherical wavefronts into spherical wavefronts that collapse at the image point. When the system is aberrated, as represented in (b), the output wavefront is distorted and the rays departing the system do not intersect at a single point on the image plane. The difference between the aberrated wavefront and the ideal, spherical, wavefront is the wavefront aberration

Graphically, the wavefront aberration is a map that shows the local differences between the actual wavefront and the reference spherical wavefront at the exit pupil. If the system were perfect, the wavefront aberration would be constant and null across the exit pupil [28, 29]. In most of the cases, the wavefront aberration is a smooth and continuous function defined within a circle having a radius equal to the radius of the exit pupil. Fortunately, some basic mathematical functions, known as Zernike polynomials, Zj, come to the rescue of finding how simple contributions combine to produce any arbitrary wavefront aberration function. By doing this, the general wavefront aberration W(ρ, θ) is decomposed as a superposition of Zernike polynomials. Some of the basic Zernike polynomials are easily linked with the Seidel aberrations, and their coefficients in the expansion, cj, are related to the importance of the corresponding term, Zj. Mathematically, this can be written as:

$$ W\left(\rho, \theta \right)=\sum \limits_{j=1}^N{c}_{\mathrm{j}}{Z}_{\mathrm{j}}\left(\rho, \theta \right), $$
(4.9)

where we have used polar coordinates (ρ, θ) with the origin at the center of the exit pupil. The mathematical form of the Zernike polynomials can be found elsewhere. In this contribution, we follow the notation presented by the Optical Society of America, where a single index j is used to denote a given polynomial [30]. An arbitrary Zernike polynomial can be seen as the product of a polynomial in ρ, times a sine or cosine function with an argument related to an integer multiple of θ. Then, the radial dependence is described by the polynomial in ρ, and the azimuthal dependence takes the form cos(mθ) or sin(mθ) (for some polynomials, the azimuthal dependence does not exist, and the Zernike polynomial shows a rotational symmetry around the center of the exit pupil). In Fig. 4.4, we show the maps of the first 15 Zernike polynomials organized in increasing order as we move downward and related to the classical Seidel aberration when possible. Each row contains polynomials of the same order (e.g., the fourth row includes four polynomials of third degree, i.e., involving ρ3 and lower powers).

Fig. 4.4
An illustration presents wavefront aberrations. The top half depicts low order aberrations like tilt, defocus, and astigmatism, and the bottom half presents high order aberrations like coma, and spherical, each represented by colorful wavefront maps.

An arrangement of the Zernike polynomials represented as phase maps. The order of the polynomial increases downward. The upper portion of the figure corresponds to the low-order aberrations (LOA), and the bottom portion, which can be extended toward higher order polynomials, is denoted as high-order aberrations (HOA). We have also identified the classical aberrations, Seidel aberrations, with the corresponding Zernike

At this point, we want to pay attention to the units used in the previous expansion. This discussion is important to fully understand the optical meaning of the Zernike decomposition. These polynomials are defined on the unit circle (a circle having a radius equal to 1). To apply them to an actual circular aperture having an arbitrary value of its radius, the radial coordinate used with the Zernike polynomials is normalized as ρ = r/R, where R is the radius of the aperture, and r is the radial coordinate within the aperture. Then, ρ becomes a dimensionless variable, which also appears when defining the wavefront aberration, W(ρ, θ). However, W represents the distance between the reference sphere and the actual wavefront. Therefore, the coefficients cj in Eq. (4.9) are also given as distances. In some applications, cj are expressed in terms of a fraction of the wavelength. Using these coefficients, it is possible to define a global parameter that informs about the discrepancy with respect to the ideal wavefront due to a collection of Zernike aberrations. This parameter is also known as the root mean square (RMS) and is defined as

$$ {\mathrm{RMS}}_J=\sqrt{\sum \limits_{j\in J}{c}_{\mathrm{j}}}, $$
(4.10)

where J is a collection of subindex for cj that identifies the terms of interest within the whole wavefront aberration.

An important property of Eq. (4.9) is that the wavefront aberration can be characterized by a collection of coefficients of the expansion. Even more, the lower degree polynomials, i.e., those involving ρ polynomials until the second degree, should not be considered as aberrations (from an optical point of view) because it could be compensated by adding a spherical (or toric) wavefront. These contributions.

correspond to Zernike polynomials from j = 0.5. Z0 is a constant term that does not disturb the shape of the aberration function (it works as an offset). The combination of Z1 and Z2 represents a tilt that could cause a misalignment of the system with respect to the axis of reference. Zernike polynomials Z3, Z4, and Z5 describe classical ametropies such as myopia, hypermetropia, and astigmatism. There exists a simple relation between the polynomial coefficients and the spherical (sphere + cylinder) ametropia of the eye:

$$ S=-\frac{r\sqrt{3{c}_4}}{R^2}-C/2, $$
(4.11)
$$ C=\frac{4\sqrt{6}\sqrt{c_3^2+{c}_5^2}}{R^2}, $$
(4.12)
$$ \theta =\frac{1}{2}{\tan}^{-1}\left(\frac{c_3}{c_5}\right), $$
(4.13)

where S, C, and θ are the sphere, cylinder, and angle of the conventional prescription notation (S, C × θ), respectively, c3, c4, and c5 are the coefficients of the Zernike expansion related to the spherical (or cylindrical) deviation of the wavefront, and R is the radius of the exit pupil of the system (for the human eye, it is related to the size of the pupil). When applied to the human eye, all these polynomials, from Z0 to Z5, are also referred to as lower order aberrations (LOA), where the main contribution comes from the coefficients c3, c4, and c5 because the offset (c0) and the misalignment (c1 and c2) should be corrected by an appropriate setting of the measurement device for a normal eye.

Polynomials higher than second-order polynomials are summarized in the higher order aberration contribution and require special attention to understand their meaning, especially when moving to higher order polynomials where the connection with classical Seidel aberrations is lost.

Fortunately, ophthalmic aberrometers provide quite a straightforward method to obtain the actual Zernike expansion of a given eye [31]. In fact, the aberrometer measures the wavefront aberration that is used to calculate the Zernike coefficients, cj, as:

$$ {c}_j=\int_0^{2\pi } d\theta \int_0^1W\left(\rho, \theta \right){Z}_j\left(\rho, \theta \right)\rho d\rho . $$
(4.14)

These coefficients are the typical output of the measurement system. For a given Zernike decomposition until j = N (where N is typically given by the resolution and accuracy of the aberrometer), Eq. (4.16) with J = 0, …, N, provides an overall value of the wavefront aberration. This quantity can be split into two main components, LOA and HOA, just by selecting the appropriate subindex j, as JLOA = 0, …, 0.5 and JHOA = 6, …, N, when calculating RMSLOA and RMSHOA, respectively. Even more, the amount of wavefront aberration that could be corrected using classical prescriptions (sphere + cylinder) would be represented by RMSj=3,4,5.

Wave Optics for Image-Forming Optical Systems

In the previous description of the index of refraction, we have briefly used the concept of wavelength, λ, to define the wavefront aberration. This parameter is directly linked to the electromagnetic nature of light. In this framework, light is seen as a propagating electromagnetic wave. The description of these waves was given by Maxwell through four fundamental equations that couple together electric and magnetic phenomena. Actually, one of the key points to accept this model was the prediction of electromagnetic waves, having a velocity related to both electric and magnetic parameters (the electric permittivity, ε, and the magnetic permeability, μ) that were already part of the description of electricity and magnetism. Then, it could be proved that \( c=1/\sqrt{\varepsilon_0{\mu}_0} \) (where the subindex denotes that we are in vacuum). If light is a wave, it can generate interferences and diffraction when superposing light with light. This actually induces significant departures with respect to the geometrical model prediction. Now, shadows are not sharp any more (even for a single-point light source) and light can, slightly, bend around corners. This is diffraction, and this phenomenon explains very well the limit of resolution, the capability of distinguishing two separate objects in the image, of optical systems.

To understand this, we only need to think of light as a wave that travels across space. When this wave reaches an aperture (or an obstacle), a part of the light is blocked by the opaque portion of the aperture and only the open part is active for further propagation (see Fig. 4.5). From a geometrical optics point of view, the propagation of light would define a sharp transition between light and shadow after the aperture. But now, light is a wave, and when it reaches the aperture, each portion of the wave passing through it acts a new emitter of waves propagating again from the aperture. The consequence of this is that light bends the edge and propagates beyond the geometrical shadow. If the aperture is circular, the distribution of light intensity can be described as

$$ I\left(\theta \right)={I}_0\frac{2{J}_1\left(\frac{2\pi }{\lambda }a\sin \theta \right)}{\frac{2\pi }{\lambda }a\sin \theta }, $$
(4.15)

where λ is the wavelength, a is the radius of the circular aperture, and θ is the departure angle with respect to the propagation of the center of the light beam. This situation is depicted in Fig. 4.6, where we have represented the Airy spot that could be seen on a screen. When considering the case of the image point given by an optical system, even though the system can be perfect from a geometrical point of view, diffraction would cause the image of point-like source to be a finite spot (if the aperture is circular, it is described by Eq. (4.15) and plotted in Fig. 4.5). Moreover, if we have two-point sources, their images will be distinguished if their respective Airy spots do not overlap. The Airy disk has a characteristic pattern with a strong maximum at the center and several dark and bright rings around it. The first ring is used to define the resolving power of the system through the well-known expression.

$$ {\theta}_{\mathrm{res}}=\frac{1.22\lambda }{D}, $$
(4.16)

where D is the diameter of the aperture of the optical system, λ is the wavelength, and θres represents the angular separation of two point sources. If the angular separation is larger than θres, then they are resolved; if smaller, the optical instrument is unable to distinguish them as two separated point sources (see Fig. 4.6). This condition is also known as the Rayleigh criterium. Therefore, not only geometrical optics (or ray tracing) limits the quality of optical systems but also diffraction, as a consequence of the wave nature of light, constrains the capabilities of image-forming systems. As a simple application of the Rayleigh diffraction limit, the human eye, having a usable entrance pupil diameter of about D = 6 mm, generates a resolution angle of θR,retina′ 0.40, which fits very well with the angular separation between photodetectors at the retinal mosaic [32].

Fig. 4.5
A diagram of an optical system focusing light beams. It illustrates how parallel rays converge through a lens onto a focal point, marked by f dash, with an aperture a, affecting the rays convergence angle ? res on the image plane.

A collection of parallel rays coming from an object located at infinity is also represented as a plane wave. This wave diffracts when passing through a lens located at the plane XY having a circular aperture with radius, a, and generates a distribution of light at its focal plane (located at the plane X0Y0). This spot is also known as the Airy disk (Eq. 4.15). The angle θres describes the angular location of the first dark ring of the Airy spot. This diffraction happens even for an unaberrated lens

Fig. 4.6
4 stages of optical resolution. A, a single point of light, b two points so close they appear as one labeled unresolved, c, two points just distinguishable labeled just resolved, and d, two distinct points of light labeled fully resolved.

A graphical representation of the image plane of an optical system having the same focal as the human eye, feye0 = 16 mm, with a pupil diameter D = 2a = 4 mm, for a wavelength at the center of the visible spectrum, λ = 555 nm. (a) Each point-like source is imaged as a spatial light distribution (Airy spot) on the image plane of an optical system. (b) Two point sources are not resolved if they are located very close. (c) Light distribution for two point sources that are separated angularly θres = 1.22λ/D. (d) Two points separated above the angle of resolution, θres, can be clearly distinguished

The Quality of an Optical System

In this section, we introduce a further refinement of the description of an optical system that is fully based on the electromagnetic model of the light. Then, optical rays and light trajectories will be replaced by wavefronts and the spatial distribution of irradiance (power per area unit) of the light. At the same time, when possible, we will look back to relate these new concepts to geometrical parameters and reasoning.

As the first step, let us recall the first Maxwell’s condition for a perfect optical system: the image of a point source has also to be a point. However, we have seen that aberrations disrupt this ideal behavior and the generation of the point-like image is not achieved. From the wave optics point of view, a point object is a source of perfect spherical wavefronts, and a point image is attained when a perfect spherical wavefront collapses at it. This is why the wavefront aberration is defined as the departure between the ideal spherical wavefront and the actual one generated by the optical system. We have already seen how this wavefront aberration can be described in terms of Zernike polynomials and how the coefficients in this expansion (see Eqs. (4.9) and (4.14)) can be related to low- and high-order aberrations. Until here, we would have a mere mathematical description of the wavefront, but we need more: we have to know how aberrations impact the distribution of light at the image plane. Then, we define quite a simple but powerful concept that describes the actual distribution of light on the image plane when the object is a point-like source. This distribution is known as the point spread function, PSF(xi, yi), where xi and yi are spatial coordinates at the image plane. Knowing that the PSF is applicable to a point source, if we have an extended source that can be seen as a collection of point sources, the resulting image is the superposition of the PSF at the location of the images of every single point in the object. We can mathematically write this as follows:

$$ I\left({x}_{\mathrm{i}},{y}_{\mathrm{i}}\right)=\int \int O\left({x}_{\mathrm{o}},{y}_{\mathrm{o}}\right)\mathrm{PSF}\left({x}_{\mathrm{i}}-M{x}_{\mathrm{o}},{y}_{\mathrm{i}}-M{y}_{\mathrm{o}}\right)\mathrm{d}{x}_{\mathrm{o}}\mathrm{d}{y}_{\mathrm{o}}, $$
(4.17)

where O(xo, yo) represents the light distribution at the object plane (using spatial coordinates xo and yo), M is the lateral magnification of the system (describing the scale factor between the image and the object), and I(xi, yi) is the light distribution at the image plane. Using a technical language, the previous integration is also known as a convolution product [33, 34].

Before going further, let us take a look at the PSF of an optical system. The behavior of waves is governed by a different set of rules when compared to geometrical ray tracing. One of the first consequences of a wave model is that optical wavefronts are distorted when passing through apertures. This phenomena is also known as diffraction, and it occurs even for perfect spherical wavefronts associated with point-like objects or images. The consequence is that, for any practical system, the image of a point will never be a point, which is a serious violation of the first Maxwell’s condition for a perfect optical system. Then, we can conclude that perfect optical systems only happen within the paraxial approach. As a typical example, if we consider an optical system free of aberrations (a perfect optical system within the geometrical model), but having a finite transversal size realized as a circular aperture, the image of a point source (its PSF) has a distribution quite well-known as the Airy disk (see Figs. 4.5 and 4.6). When this happens, we have the best possible optical instrument that is qualified as a diffraction-limited optical system.

A dedicated discussion on how to overlap the images coming from two point-like sources helped to define the Rayleigh criterium for the resolving power of an optical system (see Eq. 4.16 and Fig. 4.6). The same situation happens when trying to distinguish the bright and dark stripes of a periodic grating: if they are not resolved, the contrast between dark and bright is lower and they tend to look as a uniformly illuminated object. These objects are very useful in optics when describing the quality of an optical system. In fact, their use relies on a mathematical transformation known as Fourier transform. The concept is quite simple: a periodic distribution of light can be associated with a given spatial frequency, where this spatial frequency is just the inverse of the spatial period of the object. For example, if a periodic variation repeats itself only once over an angular extent of 1°, then its spatial frequency is 1 cycle/deg., and if the spatial period repeats two times, then the spatial frequency will be 2 cycles/deg. The same could be said if the periodicity is repeated over a given length, providing spatial frequencies expressed as cycles/mm. The key advantage of this treatment is that any arbitrary light distribution can be expanded as the superposition of pure periodic light distributions, each one having its characteristic spatial frequency and a weight in this superposition calculated through a very sound mathematical relation. In optics, as far as the distribution of light is usually projected on a plane (meaning two dimensions), the applicable Fourier transform also needs to be 2D. From a mathematical point of view, this transformation is given as:

$$ \varPhi \left(\xi, \eta \right)=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty }I\left(x,y\right)\exp \left[-i2\pi \left( x\xi + y\eta \right)\right]\mathrm{d}x\mathrm{d}y, $$
(4.18)
$$ I\left(x,y\right)=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}\varPhi \left(\xi, \eta \right)\exp \left[+i2\pi \left( x\xi + y\eta \right)\right]\mathrm{d}\xi \mathrm{d}\eta, $$
(4.19)

where I(x, y) is the light distribution on a given plane with coordinates (x, y), and Φ(ξ, η) is the so-called spatial frequency spectrum (or Fourier transform of I), where the coordinates ξ and η represent the spatial frequencies along the X and Y directions, respectively (i is the imaginary unit, i2 = −1).

The explanation of the capabilities of this methodology, using the Fourier transform, are beyond the scope of this chapter, and they range from image-processing algorithms to the optical design of optical systems. However, there are a couple of things worth mentioning here: Fourier transforms provide a framework where the image-forming mechanism can be seen as the application of a filter in spatial frequencies; also, this formalism makes defining important figures of merit of optical systems, such as the modulation transfer function possible. Following this first point, we can rewrite Eq. (4.17) as.

$$ {\varPhi}_i\left(\xi, \eta \right)={\varPhi}_o\left(\xi, \eta \right)\mathrm{OTF}\left(\xi, \eta \right), $$
(4.20)

where Φi and Φo are the Fourier transforms of the image and object, respectively, (I(xi, yi) and O(xo, yo) in Eq. (4.17)), and OTF is the Fourier transform of the optical transfer function (PSF). Eq. (4.20) has very important consequences, once we fully understand the meaning of the Fourier transform. The transformation from a distribution of light, I(x, y), to its spatial frequency spectrum Φ(ξ, η) provides the same information but arranged in a different way. For example, the fine details in the object O(x, y), i.e., those portions requiring higher resolution of the optical system, are represented by the value of Φ at larger values of the spatial frequencies ξ and/or η. If the OTF has a zero value at those spatial frequencies related to those details, the image will not contain such information and those high spatial frequency features will be lost.

The optical transfer function is a complex valued function that can be written in terms of its modulus, MTF, and phase, PTF: OTF = MTFexp(iPTF) (where i = −1, and the complex exponential can be written as a real part and an imaginary part as exp(iPTF) = cos(PTF) + isin(PTF)). Here, we find the modulation transfer function (MTF) as the modulus of the optical transfer function. So, we have a mathematically sound way of describing the image-forming procedure within the electromagnetic model.

Another way of understanding how the MTF quantifies the quality of an optical system is by exemplifying its effect using quite a simple object: a collection of sine wave targets having different spatial periods (the spatial period, p, is related to the spatial frequency, ξ = 1/p) as those depicted in Fig. 4.7 that present as pure white at its maximum and as pure black at its minimum. Then, the contrast of these targets, defined as Mo(ξ) = (Imax − Imin)/(Imax + Imin), is equal to 1. These light distributions are imaged by an optical system having an MTF, that is, typically, a decreasing function of ξ (see Fig. 4.7). The result is a collection of images, one for each target, where the maximum and the minimum are not pure white and black anymore and the images show a different contrast. Then, the ratio between the contrast of the image and the object is also the MTF at the given spatial frequency, ξ:

Fig. 4.7
A test pattern for an optical system with varying line densities, a Modulation Transfer Function graph exhibits system response to spatial frequencies, and the resulting image pattern illustrating the optical resolution limit.

An object having a sine wave distribution of light is imaged into another sine wave distribution that has a lower contrast than that of the object. The relation between the two contrasts is the value of the MTF at the spatial frequency of the object, ξ = 1/p, where p is the spatial period of the targets. In the upper row is a collection of four objects having a spatial frequency that increases when moving from the first to the fourth object (the spatial frequency is doubled in every step). We have considered an optical system that is represented by its MTF. The row at the bottom shows the image for every object. We can see how the contrast diminishes as the spatial frequency increases. The spatial frequency, ξ, is given as a multiple of a reference frequency ξ0. We have also represented the value of the cut-off frequency, ξcut−off, where the MTF cancels

$$ \mathrm{MTF}\left(\xi \right)=\frac{M_i\left(\xi \right)}{M_o\left(\xi \right)}. $$
(4.21)

In every MTF plot, we find a value of the spatial frequency where the MTF reaches the value of zero. This maximum frequency is known as the cut-off frequency and strongly depends on the applicable diffractive effects. For example, for a diffraction-limited optical system, the cut-off frequency is ξcut−off = D/λ if measured in cycles/rad and is ξcut−off = Df if expressed in cycles/mm, where f″ is the focal length of the optical system [35]. We can see that this cut-off frequency is strongly related to the angular resolutions, θres (see Eq. (4.16)) (Fig. 4.8).

Therefore, the MTF becomes a figure of merit of the optical system that clearly describes how good an instrument is when reproducing a given object. As a matter of fact, by using this concept it is possible to understand that, even in the absence of aberrations, an optical instrument is not able to reproduce well all the details of the object because the value of MTF is only 1 at ξ = 0 (ξ = 0 means that the object has a constant distribution of light, and it is a uniform background) and the contrast for spatial frequencies larger than 0 will be diminished. This situation, where only diffraction is considered, established the attainable goal for the quality of an optical system, when all the aberrations are removed and the system reaches the diffraction-limited behavior.

When analyzing the actual behavior of the eye, there is a more psychophysical function known as the contrast sensitivity function, CSF, that measures the perceived contrast of sinusoidal patterns. The CSF contains contribution from the optical system of the eye plus the response of the processing unit, the visual cortex of the brain. Therefore, the information provided by the MTF has to be weighted with the neural response that is characterized by the neural contrast sensitivity function, CSFN, to provide the actual value of the contrast sensitivity function in the form CSF = MTF × CSFN [36].

Conclusions

These ideas and formalism are part of the tools necessary for the full understanding of the fitting of intra-ocular lenses. The optical behavior of the human eye can be outlined using the paraxial formalism. However, the results of the first-order approximation fall short with the new advances in science and technology: better tools for diagnosis, improved morphological characterization, and high-precision surgical procedures. Ophthalmic aberrometers and corneal topography systems provide sufficient information about the contribution of the optical elements of the eye: cornea and lens. Pachymetry and some optical coherence tomographic techniques measure the longitudinal dimensions of the eye, cornea, and lens. All these tools, along with the data obtained for the optical constant of the ocular media (corneal stroma, aqueous humor, lens, and vitreous body), can provide an estimate of the human’s eye optical performance. Vision research laboratories are at the forefront in obtaining values of the wavefront aberration, W(ρ,φ), the PSF and MTF of the eye, and analyzing the psychophysical response of the visual system to a wide variety of stimuli and conditions: monochromatic and polychromatic tests, photopic and scotopic illuminations, etc. Soon enough, the advances in research will be applied to ophthalmology’s daily practices. As a practical example, the contribution to the total aberration coming from the corneal topography—external and internal surface—can be detached from the total aberration and the lens contribution can be extracted. Therefore, an advanced design of an intra-ocular lens that compensates both contributions, located at the lens position, could improve the quality of the eye toward the diffraction-limited situation. However, the neural adaptation of the visual system to the native aberration may temporarily jeopardize the improvements made: the brain must readapt itself to the new optical performance of the eye.

Fig. 4.8
An optical experiment setup with three columns. The first presents an emblem as the object and its spatial frequency representation. The second illustrates apertures of different sizes. The third depicts the resulting images, with quality decreasing from a large to a narrow aperture.

The object at the top left can be coded in spatial frequencies through the application of the Fourier transformation (represented in logarithmic scale at the bottom left). Both representations to the left of this figure contain the same information. At the right, we have simulated how the object is reproduced when the system is not able to represent high frequency components (fine details). This filtering is strongly dependent on the aperture size of the optical system

In this chapter, we have revisited the basic concepts of image-forming systems from two points of view: the geometrical realm and the physical optics model. We have seen that beyond paraxiality, it is still possible to understand how light propagates from the object to the image. Light trajectories can be calculated with quite a simple set of rules. These rules are efficiently applied by computers to provide an accurate evaluation of the system’s performance. This performance is affected by aberrations, which disturb the ideal conditions, and by diffraction, which intrinsically limits the performance of an optical system. Although aberrations can be controlled in an efficient way, diffraction will ultimately limit the quality of the image.

Both diffractions and aberrations limit the optical performance of the human eye. A full understanding of these limitations may help us find efficient solutions when vision quality is compromised, and its recovery requires surgical treatments or the replacement of bio-elements by artificial ones. Modern intra-ocular lens designs are key in today’s ophthalmological treatments. They offer controlled aberration, multiple foci, and improved biocompatibility and biostability. Moreover, advanced medical skills and procedures are now continuously challenging the limits of technology and science to provide better and more flexible solutions for the well-being of patients.