# Comparison of Discrete and Continuous Wavelet Transforms

**DOI:**https://doi.org/10.1007/978-3-642-27737-5_77-2

## Keywords

Hilbert Space Wavelet Transform Continuous Wavelet Transform Continuous Wavelet Multiresolution AnalysisOur purpose is to outline a number of direct links between the two cases of wavelet analysis: continuous and discrete. The theme of the first is perhaps best known, for example, the creation of compactly supported wavelets in *L* ^{2}(ℝ^{ n }) with suitable properties such as localization, vanishing moments, and differentiability. The second (discrete) deals with computation, with sparse matrices, and with algorithms for encoding digitized information such as speech and images. This is centered on constructive approaches to subdivision filters, their matrix representation (by sparse matrices), and corresponding fast algorithms. For both approaches, we outline computational transforms; but our emphasis is on effective and direct links between computational analysis of discrete filters on the one side and on continuous wavelets on the other. By the latter, we include both *L* ^{2}(ℝ^{ n }) analysis and fractal analysis. To facilitate the discussion of the interplay between discrete (used by engineers) and continuous (harmonic analysis), we include a list of terminology commonly used in the two areas; and we include comments on translation between them.

*Multiresolutions.*Haar’s work from 1909 to 1910 implicitly had the key idea which got wavelet mathematics started on a roll 75 years later with Yves Meyer, Ingrid Daubechies, Stéphane Mallat, and others – namely, the idea of a multiresolution. In that respect Haar was ahead of his time. See Figs. 1 and 2 for details.

The word “multiresolution” suggests a connection to optics from physics. So that should have been a hint to mathematicians to take a closer look at trends in signal and image processing! Moreover, even staying within mathematics, it turns out that as a general notion, this same idea of a “multiresolution” has long roots in mathematics, even in such modern and pure areas as operator theory and Hilbert space geometry. Looking even closer at these interconnections, we can now recognize scales of subspaces (so-called multiresolutions) in classical algorithmic construction of orthogonal bases in inner-product spaces, now taught in lots of mathematics courses under the name of the Gram–Schmidt algorithm. Indeed, a closer look at good old Gram–Schmidt reveals that it is a matrix algorithm, hence new mathematical tools involving non-commutativity!

If the signal to be analyzed is an image, then why not select a fixed but suitable *resolution* (or a subspace of signals corresponding to a selected resolution) and then do the computations there? The selection of a fixed “resolution” is dictated by practical concerns. That idea was key in turning the computation of wavelet coefficients into iterated matrix algorithms. As the matrix operations get large, the computation is carried out in a variety of paths arising from big matrix products. The dichotomy, continuous vs. discrete, is quite familiar to engineers. The industrial engineers typically work with huge volumes of numbers.

Numbers! – so why wavelets? Well, what matters to the industrial engineer is not really the wavelets, but the fact that special wavelet functions serve as an efficient way to encode large data sets – I mean encode for computations. And the wavelet algorithms are computational. They work on numbers. Encoding numbers into pictures, images, or graphs of functions comes later, perhaps at the very end of the computation. But without the graphics, I doubt that we would understand any of this half as well as we do now. The same can be said for the many issues that relate to the crucial mathematical concept of self-similarity, as we know it from fractals and more generally from recursive algorithms.

## Glossary

This *Glossary* consists of a list of terms used in this entry: in mathematics, in probability, in engineering, and on occasion in physics. To clarify the seemingly confusing use of up to four different names for the same idea or concept, we have further added informal explanations spelling out the reasons behind the differences in current terminology from neighboring fields.

Mathematics | Probability | Engineering | Physics |
---|---|---|---|

| | | |

Mathematically, functions may map between any two sets, say, from Yet function theory is widely used in engineering where functions are typically thought of as signal. In this case, Turning to physics, in our present application, the physical functions will be typically be in some | |||

| | | |

Mathematically, a sequence is a function defined on the integers ℤ or on subsets of ℤ, for example, the natural numbers ℕ. Hence, if time is discrete, this to the engineer represents a time series, such as a speech signal, or any measurement which depends on time. But we will also allow functions on lattices such as ℤ In the case A random walk on ℤ | |||

| | | |

While finite or infinite families of nested subspaces are ubiquitous in mathematics and have been popular in Hilbert space theory for generations (at least since the 1930s), this idea was revived in a different guise in 1986 by Stéphane Mallat, then an engineering graduate student. In its adaptation to wavelets, the idea is now referred to as the multiresolution method. What made the idea especially popular in the wavelet community was that it offered a skeleton on which various discrete algorithms in applied mathematics could be attached and turned into wavelet constructions in harmonic analysis. In fact what we now call multiresolutions have come to signify a crucial link between the world of discrete wavelet algorithms, which are popular in computational mathematics and in engineering (signal/image processing, data mining, etc.) on the one side and on the other side continuous wavelet bases in function spaces, especially in But in mathematics, or more precisely in operator theory, the underlying idea dates back to the work of John von Neumann, Norbert Wiener, and Herman Wold, where nested and closed subspaces in Hilbert space were used extensively in an axiomatic approach to stationary processes, especially for time series. Wold proved that any (stationary) time series can be decomposed into two different parts: The first (deterministic) part can be exactly described by a linear combination of its own past, while the second part is the opposite extreme; it is Von Neumann’s version of the same theorem is a pillar in operator theory. It states that every isometry in a Hilbert space ℋ is the unique sum of a shift isometry and a unitary operator, i.e., the initial Hilbert space ℋ splits canonically as an orthogonal sum of two subspaces ℋ \( \begin{array}{c}\cdots \subset {V}_{-1}\subset {V}_0\subset {V}_1\subset {V}_2\subset \cdots \subset {V}_n\subset {V}_{n+1}\subset \cdots \\ {}\underset{n}{\varLambda }{V}_n={\mathrm{\mathcal{H}}}_u,\kern0.5em \mathrm{and}\kern0.5em \underset{n}{\varLambda }{V}_n=\mathrm{\mathcal{H}}.\end{array} \) However, Stéphane Mallat was motivated instead by the notion of scales of resolutions in the sense of optics. This in turn is based on a certain “artificial-intelligence” approach to vision and optics, developed earlier by David Marr at MIT, an approach which imitates the mechanism of vision in the human eye. The connection from these developments in the 1980s back to von Neumann is this: Each of the closed subspaces This view became an instant hit in the wavelet community, as it offered a repository for the fundamental father and the mother functions, also called the scaling function In all of this, there was a second “accident” at play: As it turned out, pyramid algorithms in wavelet analysis now lend themselves via multiresolutions, or nested scales of closed subspaces, to an analysis based on frequency bands. Here we refer to bands of frequencies as they have already been used for a long time in signal processing. One reason for the success in varied disciplines of the same geometric idea is perhaps that it is closely modeled on how we historically have represented numbers in the positional number system. Analogies to the Euclidean algorithm seem especially compelling. | |||

| | | |

In linear algebra students are familiar with the distinctions between (linear) transformations This context is somewhat different from that of quantum mechanical (QM) operators | |||

| | | |

The following dual pairs position If instead some filter | |||

| – | | |

Pointwise multiplication of functions of frequencies corresponds in the Fourier dual time domain to the operation of convolution (or of Cauchy product if the time scale is discrete). The process of modifying a signal with a fixed convolution is called a linear filter in signal processing. The corresponding Fourier dual frequency function is then referred to as “frequency response” or the “frequency response function.” More generally, in the continuous case, since convolution tends to improve smoothness of functions, physicists call it “smearing.” | |||

| – | | |

Calculating the Fourier coefficients is “analysis,” and adding up the pure frequencies (i.e., summing the Fourier series) is called synthesis. But this view carries over more generally to engineering where there are more operations involved on the two sides, e.g., breaking up a signal into its frequency bands, transforming further, and then adding up the “banded” functions in the end. If the signal out is the same as the signal in, we say that the analysis/synthesis yields perfect reconstruction. | |||

| | | |

Here the terms related to “synthesis” refer to the second half of the kind of signal-processing design outlined in the previous paragraph. | |||

| – | | |

For a space of functions (signals), the selection of certain frequencies serves as a way of selecting special signals. When the process of scaling is introduced into optics of a digital camera, we note that a nested family of subspaces corresponds to a grading of visual resolutions. | |||

| – | | |

\( {\displaystyle \sum_{i=0}^{N-1}}{S}_i{S}_i^{*}=1,\kern0.5em \mathrm{and}\kern0.5em {S}_i^{*}{S}_j={\delta}_{i,j}1. \) | |||

| | | |

In many applications, a vector space with inner product captures perfectly the geometric and probabilistic features of the situation. This can be axiomatized in the language of Hilbert space; and the inner product is the most crucial ingredient in the familiar axiom system for Hilbert space. | |||

| – | | |

Systems theory language for operators | |||

| – | – | – |

Intuitively, think of a fractal as reflecting similarity of scales such as is seen in fernlike images that look “roughly” the same at small and at large scales. Fractals are produced from an infinite iteration of a finite set of maps, and this algorithm is perfectly suited to the kind of subdivision which is a cornerstone of the discrete wavelet algorithm. Self-similarity could refer alternately to space and to time. And further versatility is added, in that flexibility is allowed into the definition of “similar.” | |||

– | – | | – |

The problem of how to handle and make use of large volumes of data is a corollary of the digital revolution. As a result, the subject of data mining itself changes rapidly. Digitized information (data) is now easy to capture automatically and to store electronically. In science, in commerce, and in industry, data represents collected observations and information: In business, there is data on markets, competitors, and customers. In manufacturing, there is data for optimizing production opportunities and for improving processes. A tremendous potential for data mining exists in medicine, genetics, and energy. But raw data is not always directly usable, as is evident by inspection. A key to advances is our ability to One of the structures often hidden in data sets is some degree of |

## Definition

In this entry we outline several points of view on the interplay between discrete and continuous wavelet transforms, stressing both pure and applied aspects of both. We outline some new links between the two transform technologies based on the theory of representations of generators and relations. By this, we mean a finite system of generators which are represented by operators in Hilbert space. We further outline how these representations yield subband filter banks for signal- and image-processing algorithms.

The word “wavelet transform” (WT) means different things to different people: Pure and applied mathematicians typically give different answers the question “What is the WT?” And engineers in turn have their own preferred quite different approach to WTs. Still there are two main trends in how WTs are used: the *continuous* WT on one side and the *discrete* WT on the other. Here we offer a user-friendly outline of both but with a slant toward geometric methods from the theory of operators in Hilbert space.

Our entry is organized as follows: For the benefit of diverse reader groups, we begin with section “Glossary.” This is a substantial part of our account, and it reflects the multiplicity of how the subject is used.

The concept of multiresolutions or multiresolution analysis (MRA) serves as a link between the discrete and continuous theory.

In section “List of Names and Discoveries,” we summarize how different mathematicians and scientists have contributed to and shaped the subject over the years.

The next two sections then offer a technical overview of both discrete and continuous WTs. This includes basic tools from Fourier analysis and from operators in Hilbert space. In sections “Tools from Mathematics” and “A Transfer Operator,” we outline the connections between the separate parts of mathematics and their applications to WTs.

## Introduction

While applied problems such as time series, signals, and processing of digital images come from engineering and from the sciences, they have in the past two decades taken a life of their own as an exciting new area of applied mathematics. While searches in Google on these keywords typically yield sites numbered in the millions, the diversity of applications is wide, and it seems reasonable here to narrow our focus to some of the approaches that are both more mathematical and more recent. For references, see, for example, Aubert and Kornprobst (2006), Bredies et al. (2006), Liu (2006), Strang and Nguyen (1996). In addition, our own interests (e.g., Jorgensen 2003, 2006a; Song 2006a, b) have colored the presentation below. Each of the two areas, the discrete side and the continuous theory, is huge as measured by recent journal publications. A leading theme in our entry is the independent interest in a multitude of interconnections between the discrete algorithm and their uses in the more mathematical analysis of function spaces (continuous wavelet transforms). The mathematics involved in the study and the applications of this interaction we feel is of benefit to both mathematicians and to engineers. See also (Jorgensen 2003). An early paper (Daubechies and Lagarias 1992) by Daubechies and Lagarias was especially influential in connecting the two worlds, discrete and continuous.

## The Discrete Versus Continuous Wavelet Algorithms

### The Discrete Wavelet Transform

If one stays with function spaces, it is then popular to pick the *d*-dimensional Lebesgue measure on ℝ^{ d }, *d* = 1, 2,…, and pass to the Hilbert space *L* ^{2}(ℝ^{ d }) of all square integrable functions on ℝ^{ d }, referring to d-dimensional Lebesgue measure. A wavelet basis refers to a family of basis functions for *L* ^{2}(ℝ^{ d }) generated from a finite set of normalized functions *ψ* _{ i }, the index *i* chosen from a fixed and finite index set I and from two operations: one called scaling and the other translation. The scaling is typically specified by a d matrix over the integers ℤ such that all the eigenvalues in modulus are bigger than one and lie outside the closed unit disk in the complex plane. The *d* -lattice is denoted ℤ^{ d }, and the translations will be by vectors selected from ℤ^{ d }. We say that we have a wavelet basis if the triple indexed family *ψ* _{ i,j,k }(*x*) := |*detA*|^{ j/2} *ψ*(*A* ^{ j } *x* + *k*) forms an orthonormal basis (ONB) for *L* ^{2}(ℝ^{ d }) as i varies in I, *j* ∈ ℤ, and *k* ∈ ℝ^{ d }. The word “orthonormal” for a family *F* of vectors in a Hilbert space ℋ refers to the norm and the inner product in ℋ: The vectors in an orthonormal family F are assumed to have norm one and to be mutually orthogonal. If the family is also total (i.e., the vectors in *F* span a subspace which is dense in ℋ), we say that F is an orthonormal basis (ONB).

While there are other popular wavelet bases, for example, frame bases and dual bases (see, e.g., Baggett et al. (2005), Dutkay and Roysland (2007b) and the papers cited there), the ONBs are the most agreeable at least from the mathematical point of view.

That there are bases of this kind is not at all clear, and the subject of wavelets in this continuous context has gained much from its connections to the discrete world of signal and image processing.

Here we shall outline some of these connections with an emphasis on the mathematical context. So we will be stressing the theory of Hilbert space and bounded linear operators acting in Hilbert space ℋ, both individual operators and families of operators which form algebras.

As was noticed recently, the operators which specify particular subband algorithms from the discrete world of signal processing turn out to satisfy relations that were found (or rediscovered independently) in the theory of operator algebras and which go under the name of Cuntz algebras, denoted \( {\mathcal{O}}_N \) if *n* is the number of bands. For additional details, see, e.g., Jorgensen (2006a).

*C** − algebra has generators (

*S*

_{ i })

_{ i=0}

^{ N−1}, and the relations are

**1**is the identity element in \( {\mathcal{O}}_N \)) and

*S*

_{ i }turn into bounded operators, also denoted

*S*

_{ i }, and the identity element

**1**turns into the identity operator

*I*in ℋ, i.e., the operator

*I*:

*h*→

*h*, for

*h*∈ ℋ. In operator language, the two formulas, Eqs. 1 and 2, state that each

*S*

_{ i }is an isometry in ℋ and that the respective ranges

*S*

_{ i }ℋ are mutually orthogonal, i.e.,

*S*

_{ i }ℋ ⊥

*S*

_{ j }ℋ for

*i*≠

*j*. Introducing the projections

*P*

_{ i }=

*S*

_{ i }

*S*

_{ i }

^{*}, we get

*P*

_{ i }

*P*

_{ j }=

*δ*

_{ i,j }

*P*

_{ i }, and

In the engineering literature this takes the form of programming diagrams.

*n*= 5) (Fig. 4)

Selecting a resolution subspace *V* _{0} = *closure span*{*φ*(⋅− *k*)|*k* ∈ ℤ}, we arrive at a wavelet subdivision {*ψ* _{ j,k }|*j* ≥ 0, *k* ∈ ℤ}, where *ψ* _{ j,k }(*x*) = 2^{ j/2} *ψ*(2^{ j } *x* − *k*), and the continuous expansion \( f={\displaystyle \sum_{j,k}}<{\psi}_{j,k}\Big|f>{\psi}_{j,k} \) or the discrete analogue derived from the isometries, *i* = 1, 2, ⋯, *N* − 1, *S* _{0} ^{ k } *S* _{ i } for *k* = 0, 1, 2, ⋯ called the discrete wavelet transform.

*Notational convention.* In algorithms, the letter *N* is popular and often used for counting more than one thing.

In the present context of the Discrete Wavelet Algorithm (DWA) or DWT, we count two things, “the number of times a picture is decomposed via subdivision.” We have used *n* for this. The other related but different number *N* is the number of subbands, *N* = 2 for the dyadic DWT and *N* = 4 for the image DWT. The image-processing WT in our present context is the tensor product of the 1-D dyadic WT, so 2 × 2 = 4. Caution: Not all DWAs arise as tensor products of *N* = 2 models. The wavelets coming from tensor products are called separable. When a particular image-processing scheme is used for generating continuous wavelets, it is not transparent if we are looking at a separable or inseparable wavelet!

*N*is made initially; and the same

*N*is used in different runs of the programs. In contrast, the number of times a picture is decomposed varies from one experiment to the next! (Fig. 5)

**Summary**: *N* = 2 for the dyadic DWT: The operators in the representation are *S* _{0} and *S* _{1}: one average operator and one detail operator. The detail operator *S* _{1} “counts” local detail variations.

Image processing. Then *N* = 4 is fixed as we run different images in the DWT: The operators are now *S* _{0}, *S* _{ H }, *S* _{ V }, and *S* _{ D } – one average operator and three detail operators for local detail variations in the three directions in the plane.

### The Continuous Wavelet Transform

*L*

^{2}(ℝ). To start a continuous WT, we must select a function

*ψ*∈

*L*

^{2}(ℝ) and

*r*,

*s*∈ ℝ such that the following family of functions

*L*

^{2}(ℝ). An over-complete family of vectors in a Hilbert space is often called a coherent decomposition. This terminology comes from quantum optics. What is needed for a continuous WT in the simplest case is the following representation valid for all

*f*∈

*L*

^{2}(ℝ):

### Some Background on Hilbert Space

*L*

^{2}(ℝ), then

*ℓ*

^{2}(ℤ), then

*e*

_{ n }(

*θ*) =

*e*

^{ inθ },

*f*∈

*L*

^{2}(ℝ), then

*J*be an index set. We shall only need to consider the case when

*J*is countable. Let {

*ψ*

_{ α }}

_{ α∈J }be a family of nonzero vectors in a Hilbert space ℋ. We say it is an

*orthonormal basis*(ONB) if

*ψ*

_{ α }}

_{ α∈J }is a (normalized)

*tight frame*. We say that it is a

*frame*with

*frame constants*0 <

*A*≤

*B*<

*∞*if

_{ α }:= |

*ψ*

_{ α }〉〈

*ψ*

_{ α }| of Dirac’s terminology, see Bratelli and Jorgensen (2002), we see that {

*ψ*

_{ α }}

_{ α∈J }is an ONB if and only if the

*Q*

_{ α }’s are projections, and

*Q*

_{ α }

*.*It is a frame with frame constants

*A*and

*B*if the operator

*H*

_{ i }=

*H*

_{ i }

^{*},

*i*= 1, 2 satisfy

*H*

_{1}≤

*H*

_{2}if 〈

*f*|

*H*

_{1}

*f*〉 ≤ 〈

*f*|

*H*

_{2}

*f*〉 and holds for all

*f*∈ ℋ.) If

*h*,

*k*are vectors in a Hilbert space ℋ, then the operator

*A*= |

*h*〉〈

*k*| is defined by the identity 〈

*u*|

*Av*〉 = 〈

*u*|

*h*〉 〈

*k*|

*v*〉 for all

*u*,

*v*∈ ℋ.

*L*

^{2}(ℝ) are generated by simple operations on one or more functions

*ψ*in

*L*

^{2}(ℝ); the operations come in pairs, say scaling and translation or phase modulation and translations. If

*N*∈ {2, 3, …}, we set

#### Increasing the Dimension

*φ*for the father function and

*ψ*for the mother function. A 1-level wavelet transform of an

*N*×

*M*image can be represented as

^{1}, d

^{1}, a

^{1}and v

^{1}each have the dimension of

*N*/2 by

*M*/2:

*φ*is the father function and

*ψ*is the mother function in the sense of wavelet,

*V*space denotes the average space, and the

*W*spaces are the difference space from multiresolution analysis (MRA) (Daubechies 1992).

*h*

_{ i }) and d := (

*g*

_{ i }):

**a**is for averages and

**d**is for local differences. They are really the input for the DWT. But they also are the key link between the two transforms: the discrete and continuous. The link is made up of the following scaling identities:

*h*

_{ i }) may be real or complex; they may be finite or infinite in number. If there are four of them, it is called the “four tap.” The finite case is best for computations since it corresponds to compactly supported functions. This means that the two functions

*φ*and

*ψ*will vanish outside some finite interval on a real line.

The systems *h* and *g* are both low-pass and high-pass filter coefficients. In equation (6), a^{1} denotes the first averaged image, which consists of average intensity values of the original image. Note that only *φ* function, *V* space, and *h* coefficients are used here. Similarly, *h* ^{1} denotes the first detail image of horizontal components, which consists of intensity difference along the vertical axis of the original image. Note that *φ* function is used on *y*, *ψ* function on *x*, *W* space for *x* values, and *V* space for *y* values; and both *h* and *g* coefficients are used accordingly. The data *v* ^{1} denotes the first detail image of vertical components, which consists of intensity difference along the horizontal axis of the original image. Note that *φ* function is used on *x, ψ* function on *y*, *W* space for *y* values, and *V* space for *x* values; and both *h* and *g* coefficients are used accordingly. Finally, *d* ^{1} denotes the first detail image of diagonal components, which consists of intensity difference along the diagonal axis of the original image. The original image is reconstructed from the decomposed image by taking the sum of the averaged image and the detail images and scaling by a scaling factor. It could be noted that only *ψ* function, *W* space, and *g* coefficients are used here. See Walker (1999), Song (2006b).

This decomposition not only limits to one step but it can be done again and again on the averaged detail depending on the size of the image. Once it stops at certain level, quantization (see Skodras et al. 2001; Usevitch 2001) is done on the image. This quantization step may be lossy or lossless. Then the lossless entropy encoding is done on the decomposed and quantized image.

*S*

_{0}and

*S*

_{1}have equivalent matrix representations. Recall that by Parseval’s formula, we have \( {L}^2\left(\mathbb{T}\right)\simeq {l}^2\left(\mathrm{\mathbb{Z}}\right) \). So representing

*S*

_{0}instead as an

*∞*×

*∞*matrix acting on column vectors

*x*= (

*x*

_{ j })

_{ j∈ℤ}, we get

*F*

_{0}:=

*S*

_{0}

^{*}, we get the matrix representation

*S*

_{0}and for

*F*

_{0}:=

*S*

_{0}

^{*}is slanted. However, the slanting of one is the mirror image of the other, i.e., Open image in new window

#### Significance of Slanting

The slanted matrix representations refer to the corresponding operators in *L* ^{2}. In general operators in Hilbert function, spaces have many matrix representations, one for each orthonormal basis (ONB), but here we are concerned with the ONB consisting of the Fourier frequencies *z* ^{ j }, *j* ∈ ℤ. So in our matrix representations for the *S* operators and their adjoints, we will be acting on column vectors, each infinite column representing a vector in the sequence space *l* ^{2}. A vector in *l* ^{2} is said to be of finite size if it has only a finite set of nonzero entries.

*F*

_{0}that is effective for iterated matrix computation. Reason: When a column vector

*x*of a fixed size, say 2 s, is multiplied or acted on by

*F*

_{0}, the result is a vector

*y*of half the size, i.e., of size

*s*. So

*y*=

*F*

_{0}

*x*. If we use

*F*

_{0}and

*F*

_{1}together on

*x*, then we get two vectors, each of size

*s*, the other one

*z*=

*F*

_{1}

*x*, and we can form the combined column vector of

*y*and

*z*; stacking

*y*on top of

*z*. In our application,

*y*represents averages, while

*z*represents local differences, hence the wavelet algorithm:

### Connections to Group Theory

*coherent vector decompositions*. Both transforms apply to vectors in Hilbert space ℋ, and ℋ may vary from case to case. Common to all transforms is vector input and output. If the input agrees with the output, we say that the combined process yields the identity operator image. 1 : ℋ → ℋ or written 1

_{ℋ}. So, for example, if (

*S*

_{ i })

_{ i=0}

^{ N−1}is a finite operator system, the input/output operator example may take the form

**1**in

*L*

^{2}or in

*ℓ*

^{2}, for

*ψ*and \( \tilde{\psi} \) where \( {\psi}_{r,s}(x)={r}^{-\frac{1}{2}}\psi \left(\frac{x-s}{r}\right) \),

| Over-complete basis | Dual basis |

Continuous resolution | \( C{0}_{\psi}^{-1}{\displaystyle \underset{{\mathrm{\mathbb{R}}}^2}{\iint }}\frac{ dr\; ds}{r^2}\left|{\psi}_{r,s}\right.\left.\right\rangle \left\langle \right.\left.{\psi}_{r,s}\right| \) = 1 | \( {C}_{\psi, \widehat{\psi}}^{-1}{\displaystyle \underset{{\mathrm{\mathbb{R}}}^2}{\iint }}\frac{ dr\; ds}{r^2}\left|{\psi}_{r,s}\right.\left.\right\rangle \left\langle \right.\left.{\tilde{\psi}}_{r,s}\right| \) = 1 |

Discrete resolution | \( {\displaystyle \sum_{j\in \mathrm{\mathbb{Z}}}}{\displaystyle \sum_{k\in \mathrm{\mathbb{Z}}}}\left|{\psi}_{j,k}\right.\left.\right\rangle \left\langle \right.\left.{\psi}_{j,k}\right|=1,\kern1em {\psi}_{j,k} \) corresponding to | \( {\displaystyle \sum_{j\in \mathrm{\mathbb{Z}}}}{\displaystyle \sum_{k\in \mathrm{\mathbb{Z}}}}\left|{\psi}_{j,k}\right.\left.\right\rangle \left\langle \right.\left.{\tilde{\psi}}_{j,k}\right|=1 \) |

| Isometries in | Dual operator system in |

Sequence spaces | \( {\displaystyle \sum_{i=0}^{N-1}}{S}_i{S}_i^{*}=1 \), Where | \( {\displaystyle \sum_{i=0}^{N-1}}{S}_i{\tilde{S}}_i^{*}=1 \), for a dual operator system \( \begin{array}{l}{S}_0,\dots, {S}_{N-1},\\ {}{\tilde{S}}_0,\dots, {\tilde{S}}_{N-1}\end{array} \) |

\( \begin{array}{l}{C}_{\psi}^{-1}{\displaystyle \underset{R^2}{\iint }} dr\; ds{r}^2{\left|\;{\psi}_{r,s}\Big|f\;\right|}^2\\ {}\kern4.32em ={f}_{L^2}^2\kern1em \forall f\in {L}^2\left(\mathrm{\mathbb{R}}\right)\end{array} \) | \( \begin{array}{l}{C}_{\psi, {\tilde {\psi}}}^{-1}{\displaystyle {\displaystyle \int {\displaystyle {\int}_{{\mathrm{\mathbb{R}}}^2}\frac{ dr\; ds}{r^2}}}}\;\left\langle f,\Big|,{\psi}_{r,s}\;\right\rangle\;{\left\langle\;{\tilde {\psi}}\right.}_{r,s}\left.\Big|\right\rangle \left.g\right\rangle\;\\ {}\kern5.28em =f\Big|g\kern1.12em \forall f,g\in {L}^2\left(\mathrm{\mathbb{R}}\right)\end{array} \) |

\( \begin{array}{l}{\displaystyle \sum_{j\in Z}}{\displaystyle \sum_{k\in Z}}{\left|\;{\psi}_{j,k}\Big|f\;\right|}^2\\ {}\kern4.08em ={f}_{L^2}^2\kern1em \forall f\in {L}^2\left(\mathrm{\mathbb{R}}\right)\end{array} \) | \( \begin{array}{l}{\displaystyle \sum_{j\in Z}}{\displaystyle \sum_{k\in Z}}\;\left\langle f\right.\Big|\psi \left.{}_{j,k}\right\rangle\;{\left\langle\;\tilde{\psi}\right.}_{j,k}\left.\Big|\right\rangle \left.g\right\rangle\;\\ {}\kern4.32em =f\Big|g\kern1.12em \forall f,g\in {L}^2\left(\mathrm{\mathbb{R}}\right)\end{array} \) |

\( {\displaystyle \sum_{i=0}^{N-1}}{S}_i^{*}{c}^2={c}^2\kern1em \forall c\in {\ell}^2 \) | \( {\displaystyle \sum_{i=0}^{N-1}}\;{S}_i^{*}c\left|{\tilde{S}}_i^{*}d=c\right|d\kern1.12em \forall c,d\in {\ell}^2 \) |

*coherent vector*in mathematical physics. The representation theory for the (

*ax*+

*b*) group, i.e., the matrix group \( G=\left\{\left(\begin{array}{c}\hfill a\hfill \\ {}\hfill 0\hfill \end{array}\begin{array}{c}\hfill b\hfill \\ {}\hfill 1\hfill \end{array}\right)\Big|a\in \mathrm{\mathbb{R}}+,b\in \mathrm{\mathbb{R}}\right\} \), serves as its underpinning. Then the tables above illustrate how the {

*ψ*

_{ j,k }} wavelet system arises from a discretization of the following unitary representation of

*G*:

^{2}(ℝ). This unitary representation also explains the discretization step in passing from the first line to the second in the tables above. The functions {

*ψ*

_{ j,k }|

*j*,

*k*∈ ℤ} which make up a wavelet system result from the choice of a suitable coherent vector

*ψ*∈ L

^{2}(ℝ) and then setting

Even though this representation lies at the historical origin of the subject of wavelets, the (*ax* + *b*) group seems to be now largely forgotten in the next generation of the wavelet community. But Chaps. 1–3 of Daubechies (1992) still serve as a beautiful presentation of this (now much ignored) side of the subject. It also serves as a link to mathematical physics and to classical analysis.

## Tools from Mathematics

- (a)
Operator algebras. The theory of operator algebras in turn breaks up in two parts: One is the study of “the algebras themselves” as they emerge from the axioms of von Neumann (von Neumann algebras) and Gelfand, Kadison, and Segal (

*C**-algebras.) The other has a more applied slant: It involves “the representations” of the algebras. By this, we refer to the following: The algebras will typically be specified by generators and by relations and by a certain norm completion, in any case by a system of axioms. This holds both for the norm-closed algebras, the so-called*C**-algebras, and for the weakly closed algebras, the von Neumann algebras. In fact there is a close connection between the two parts of the theory: For example, representations of*C**-algebras generate von Neumann algebras.

To talk about representations of a fixed algebra, say *A*, we must specify a Hilbert space and a homomorphism *ρ* from i into the algebra ℬ(*H*) of all bounded operators on ℋ. We require that *ρ* sends the identity element in *A* into the identity operator acting on ℋ and that *ρ*(*a**) = (*ρ*(*a*))* where the last star now refers to the adjoint operator.

*C**-algebras, the Cuntz algebras. The Cuntz algebras are denoted \( {\mathcal{O}}_2,{\mathcal{O}}_3,\dots, \) including \( {\mathcal{O}}_{\infty } \).

- (b)
Dynamical systems. The connection between the Cuntz algebras \( {\mathcal{O}}_N \) for

*N*= 2, 3, is relevant to the kind of dynamical systems which are built on branching laws, the case of \( {\mathcal{O}}_N \) representing*N*-fold branching. The reason for this is that if*N*is fixed, \( {\mathcal{O}}_N \) includes in its definition an iterated subdivision, but within the context of Hilbert space. For more details, see, e.g., Dutkay (2004), Dutkay and Roysland (2007a), Dutkay and Jorgensen (2005, 2006a, b, c), Jorgensen (2006b). - (c)
Analysis of bases in function spaces. The connection to basis constructions using wavelets is this: The context for wavelets is a Hilbert space ℋ, where ℋ may be

*L*^{2}(ℝ^{ d }) where*d*is a dimension,*d*= 1 for the line (signals),*d*= 2 for the plane (images), etc. The more successful bases in Hilbert space are the orthonormal bases ONBs, but until the mid 1980s, there were no ONBs in*L*^{2}(ℝ^{ d }) which were entirely algorithmic and effective for computations. One reason for this is that the tools that had been used for 200 years since Fourier involved basis functions (Fourier wave functions) which were not localized. Moreover, these existing Fourier tools were not friendly to algorithmic computations.

## A Transfer Operator

A popular tool for deciding if a candidate for a wavelet basis is in fact an ONB uses a certain transfer operator. Variants of this operator are used in diverse areas of applied mathematics. It is an operator which involves a weighted average over a finite set of possibilities. Hence, it is natural for understanding random walk algorithms. As remarked in, for example, Jorgensen (2003, 2006a, b), Dutkay (2004), it was also studied in physics, for example, by David Ruelle who used to prove results on phase transition for infinite spin systems in quantum statistical mechanics. In fact the transfer operator has many incarnations (many of them known as Ruelle operators), and all of them based on *N*-fold branching laws.

In our wavelet application, the Ruelle operator weights in input over the *N* branch possibilities, and the weighting is assigned by a chosen scalar function *W* and the *W*-Ruelle operator is denoted *R* _{ W }. In the wavelet setting there is in addition a low-pass filter function *m* _{0} which in its frequency response formulation is a function on the *d*-torus *T* ^{ d } = ℝ^{ d }/ℤ^{ d }.

Since the scaling matrix *A* has integer entries, *A* passes to the quotient ℝ^{ d }/ℤ^{ d }, and the induced transformation \( {r}_A:{\mathbb{T}}^d\to {\mathbb{T}}^d \) is an *N*-fold cover, where *N* = |*detA*|, i.e., for every *x* in \( {\mathbb{T}}^d \), there are *N* distinct points *y* in \( {\mathbb{T}}^d \) solving *r* _{ A }(*y*) = *x*.

In the wavelet case, the weight function *W* is *W* = |*m* _{0}|^{2}. Then with this choice of *W*, the ONB problem for a candidate for a wavelet basis in the Hilbert space *L* ^{2}(ℝ^{ d }) as it turns out may be decided by the dimension of a distinguished eigenspace for *R* _{ W }, by the so-called Perron–Frobenius problem.

This has worked well for years for the wavelets which have an especially simple algorithm, the wavelets that are initialized by a single function, called the scaling function. These are called the multiresolution analysis (MRA) wavelets, or for short the MRA wavelets. But there are instances, for example, if a problem must be localized in frequency domain, when the MRA wavelets do not suffice, where it will by necessity include more than one scaling function. And we are then back to trying to decide if the output from the discrete algorithm and the \( {\mathcal{O}}_N \) representation is an ONB or if it has some stability property which will serve the same purpose, in case where asking for an ONB is not feasible.

## Future Directions

*φ*

_{ c }(

*z*) =

*z*

^{2}+

*c*, where

*z*is a complex variable and where

*c*is a fixed parameter. The corresponding Julia sets

*J*

_{ c }have a surprisingly rich structure. A simple way to understand them is the following: Consider the two branches of the inverse \( {\beta}_{\pm }=z\mapsto \pm \sqrt{z-c} \). Then

*J*

_{ c }is the unique minimal nonempty compact subset of ℂ, which is invariant under {

*β*

_{±}}. (There are alternative ways of presenting

*J*

_{ c }but this one fits our purpose. The Julia set

*J*of a holomorphic function, in this case

*z*↦

*z*

^{2}+

*c*, informally consists of those points whose long-time behavior under repeated iteration, or rather iteration of substitutions, can change drastically under arbitrarily small perturbations.) Here “long time” refers to large

*n*, where

*φ*

^{(n+1)}(

*z*) =

*φ*(

*φ*

^{(n)}(

*z*)),

*n*= 0, 1, … , and

*φ*

^{(0)}(

*z*) =

*z*(Figs. 6 and 7).

It would be interesting to adapt and modify the Haar wavelet and the other wavelet algorithms to the Julia sets. The two papers (Dutkay and Jorgensen 2005, 2006b) initiated such a development. Then an attempt to adapt and modify the Haar wavelet to the Julia sets was made (Dutkay et al. 2012); however, there were some limitations in finding the filters. Perhaps trying another fractal set such as tent map or others may work.

### Orthonormal Bases Generated by Cuntz Algebras

We present new results from (Dutkay et al. 2012) by borrowing section “Introduction” and part of section “Definition” from (Dutkay et al. 2012) in the rest of section “Orthonormal Bases Generated by Cuntz Algebras.” It gives a general criterion for a family generated by the Cuntz isometries to be an orthonormal basis.

### Theorem 1

*Let*ℋ

*be a Hilbert space and*(

*S*

_{ i })

_{ i=0}

^{ N−1}

*be a representation of the Cuntz algebra*\( {\mathcal{O}}_N \)

*. Let*ℰ

*be an orthonormal set in*ℋ

*and f*:

*X*→ ℋ

*a norm continuous function on a topological space X with the following properties:*

- (i)
\( \mathrm{\mathcal{E}}={{\displaystyle \cup}}_{i=0}^{N-1}{S}_i\mathrm{\mathcal{E}}. \)

- (ii)
\( \overline{\mathrm{span}}\left\{f(t):t\in X\right\}=\mathrm{\mathcal{H}} \)

*and*||*f*(*t*)|| = 1,*for all t*∈*X*. - (iii)
*There exist functions*\( {\mathfrak{m}}_i:X\to \mathrm{\mathbb{C}} \),*g*_{ i }:*X*→*X*,*i*= 0, … ,*N*− 1*such that*$$ {S}_i^{*}f(t)={\mathfrak{m}}_i(t)f\left({g}_i(t)\right),\kern1em t\in X. $$(9) - (iv)
*There exists c*_{0}∈*X such that*\( f\left({c}_0\right)\in \overline{\mathrm{span}}\mathrm{\mathcal{E}}. \) - (v)
*The only function*\( h\in \mathcal{C}(X) \)*with h*≥ 0,*h*(*c*) = 1, \( \forall c\in \left\{x\in X:f(x)\in \overline{\mathrm{span}}\mathrm{\mathcal{E}}\right\} \),*and*

*is the constant functions.*

*Then* ℰ *is an orthonormal basis for* ℋ.

### Proof

*P*is the orthogonal projection onto the closed linear span of ℰ.

*t*↦

*f*(

*t*) is norm continuous, we get that

*h*is continuous. Clearly

*h*≥ 0. Also, if \( f(c)\in \overline{\mathrm{span}}\mathrm{\mathcal{E}} \), then ||

*Pf*(

*c*)|| = ||

*f*(

*c*)|| = 1 so

*h*(

*c*) = 1. In particular, from (ii) and (iv),

*h*(

*c*

_{0}) = 1. We check (Eq. 10). Since the sets

*S*

_{ i }ℰ,

*i*= 0, …

*N*− 1 are mutually orthogonal, the union in (i) is disjoint. Therefore, for all

*t*∈

*X*,

By (v), *h* is constant and, since *h*(*c* _{0}) = 1, *h*(*t*) = 1 for all *t* ∈ *X*. Then ||*Pf*(*t*)|| = 1 for all *t* ∈ *X*. Since ||*f*(*t*)|| = 1, it follows that *f*(*t*) ∈ spanℰ for all *t* ∈ *X*. But the vectors *f*(*t*) span ℋ so \( \overline{\mathrm{span}}\mathrm{\mathcal{E}}=\mathrm{\mathcal{H}} \) and ℰ is an orthonormal basis.

### Remark 2

#### Piecewise Exponential Bases on Fractals

### Example 3

*R*be a

*d*×

*d*expansive real matrix, i.e., all the eigenvalues of

*R*have absolute value strictly greater than 1. Let

*B*⊂ ℝ

^{ d }a finite set such that

*N*= |

*B*|. Define the affine iterated function system:

*X*

_{ B }of ℝ

^{ d }which satisfies the invariance equation

*X*

_{ B }is called the attractor of the iterated function system (

*τ*

_{ b })

_{ b∈B }. Moreover,

*X*

_{ B }is given by

*μ*

_{ B }on ℝ

^{ d }satisfying the invariance equation

*f*on ℝ. We call

*μ*

_{ B }the invariant measure for the iterated function system (IFS) (

*τ*

_{ b })

_{ b∈B }. By (Hutchinson 1981),

*μ*

_{ B }is supported on the attractor

*X*

_{ B }. We say that the IFS has no overlap if

*μ*

_{ B }(

*τ*

_{ b }(

*X*

_{ B }) ∩

*τ*′

_{ b }(

*X*

_{ B })) = ∅ for all

*b*≠

*b*′ in

*B*.

*τ*

_{ b })

_{ b∈B }has no overlap. Define the map

*r*:

*X*

_{ B }→

*X*

_{ B }:

Then *r* is an *N* to 1 onto map and *μ* _{ B } is strongly invariant for *r*. Note that *r* ^{− 1}(*x*) = {*τ* _{ b }(*x*) : *b* ∈ *B*} for *μ* _{ B } a.e. *x* ∈ *X* _{ B }.

We apply Theorem 1 to the setting of Example 3, in dimension *d* = 1 for affine iterated function systems, when the set \( \frac{1}{R}B \) has a spectrum *L* (Dutkay et al. 2012).

### Definition 4

*L*in ℝ, |

*L*| =

*N*,

*R*> 1 such that

*L*is a spectrum for the set \( \frac{1}{R}B \). We say that

*c*∈ ℝ is an

*extreme cycle point*for (

*B*,

*L*) if there exists

*l*

_{0},

*l*

_{1}, … ,

*l*

_{ p−1}in

*L*such that, if

*c*

_{0}=

*c*, \( {c}_1=\frac{c_0+{l}_0}{R},{c}_2=\frac{c_1+{l}_1}{R}\dots {c}_{p-1}=\frac{c_{p-2}+{l}_{p-2}}{R} \) then \( \frac{c_{p-1}+{l}_{p-1}}{R}={c}_0 \), and |

*m*

_{ B }(

*c*

_{ i })| = 1 for

*i*= 0, … ,

*p*− 1 where

### Proposition 5

*Let*(

*m*

_{ i })

_{ i=0}

^{ N−1}

*be a QMF basis. Define the operators on L*

^{2}(

*X*,

*μ*)

*:*

*Then the operators S*

_{ i }

*are isometries and they form a representation of the Cuntz algebra*\( {\mathcal{O}}_N \)

*,*i.e.,

*The adjoint of S*

_{ i }

*is given by the formula*

### Proof

*f*,

*g*in

*L*

^{2}(

*X*,

*μ*). We use the strong invariance of

*μ*:

Then (Eq. 18) follows. The Cuntz relations in (Eq. 17) are then easily checked with Proposition ??

### Definition 6

Dutkay et al. (2012) We denote by *L** the set of all finite words with digits in *L*, including the empty word. For *l* ∈ *L* let *S* _{ l } be given as in (Eq. 16) where *m* _{ l } is replaced by the exponential *e* _{ l }. If *w* = *l* _{1} *l* _{2} … *l* _{ n } ∈ *L** then by *S* _{ w } we denote the composition \( {S}_{l_1}{S}_{l_2}\dots {S}_{l_n} \).

### Theorem 7

*Let B*⊂ ℝ, 0 ∈

*B*, |

*B*| =

*N*,

*R*> 1

*and let μ*

_{ B }

*be the invariant measure associated to the IFS τ*

_{ b }(

*x*) =

*R*

^{−1}(

*x*+

*b*)

*, b*∈

*B. Assume that the IFS has no overlap and that the set*\( \frac{1}{R}B \)

*has a spectrum L*⊂ ℝ, 0 ∈

*L. Then the set*

*is an orthonormal basis in L* ^{2}(*μ* _{ B })*. Some of the vectors in* ℰ(*L*) *are repeated but we count them only once.*

### Proof

*c*be an extreme cycle point. Then |

*m*

_{ B }(

*c*)| = 1. Using the fact that we have equality in the triangle inequality \( \left(1=\left|{m}_B(c)\right|\le \frac{1}{N}{\displaystyle \sum_{b\in B}}\left|{e}^{2\pi ibc}\right|=1\right) \), and since 0 ∈

*B*, we get that

*e*

^{2πibc }= 1 so

*bc*∈ ℤ for all

*b*∈

*B*. Also there exists another extreme cycle point

*d*and

*l*∈

*L*such that \( \frac{d+l}{R}=c \). Then we have

*S*

_{ l }

*e*

_{−c }(

*x*) =

*e*

^{2πilx }

*e*

^{2πi(Rx−b)(−c)}, if

*x*∈

*τ*

_{ b }(

*X*

_{ B }). Since

*bc*∈ ℤ and

*R*(−

*c*) +

*l*= −

*d*, we obtain

*S*

_{ w }

*e*

_{−c }, \( {S}_{w^{\hbox{'}}}{e}_{-{c}^{\hbox{'}}} \) are either equal or orthogonal for

*w*,

*w*′ in

*L** and

*c*,

*c*′ extreme cycle points for (

*B*,

*L*). Using (Eq. 19), we can append some letters at the end of

*w*and

*w*′ such that the new words have the same length:

Moreover, repeating the letters for the cycle points *d* and *d*′ as many times as we want, we can assume that *α* ends in a repetition of the letters associated to *d* and similarly for *β* and *d*′. But since |*wα*| = |*w*′*β*|, the Cuntz relations imply that \( {S}_{w\alpha}{e}_{-d}\perp {S}_{w^{\hbox{'}}\beta }{e}_{-{d}^{\hbox{'}}} \) or *wα* = *w*′*β*. Assume |*w*| ≤ |*w*′|. Then *α* = *w*″*β* for some word *w*″. Then \( {S}_{w\alpha}{e}_{-d}\perp {S}_{w^{\hbox{'}}\beta }{e}_{-d} \) iff *S* _{ α } *e* _{(−d)} ⊥ *S* _{ w″β } *e* _{−d }. Also, *α* consists of repetitions of the digits of the cycle associated to *d* and similarly for *d*′. So \( {S}_{\alpha }{e}_{-d}={e}_{-f},{S}_{w^{\hbox{'}\hbox{'}}\beta }{e}_{-{d}^{\hbox{'}}}={e}_{-{f}^{\hbox{'}}} \), and all points *d*, *d*′, *f*, *f*′, *c*, *c*′ all belong to the same cycle. So the only case when *S* _{ w } *e* _{−c } is not orthogonal to \( {S}_{w^{\hbox{'}}}{e}_{-{c}^{\hbox{'}}} \) is when they are equal.

*f*(

*t*) =

*e*

_{−t }∈

*L*

^{2}(

*μ*

_{ B }). To check (i) we just to have to see that

*e*

_{−c }∈ ∪

_{ l ∈ L }

*S*

_{ l }ℰ(

*L*). But this follows from (1). Requirement (ii) is clear. For (iii), we compute

So (iii) is satisfied with \( {\mathfrak{m}}_l(t)=\overline{m_B}\left(\frac{t+l}{R}\right) \), \( {g}_l(t)=\frac{t+l}{R} \).

*c*

_{0}= −

*c*for any extreme cycle point (0 is always one). For (v), take

*h*continuous on ℝ, 0 ≤

*h*≤ 1,

*h*(

*c*) = 1, for all

*c*with \( {e}_{-c}\in \overline{\mathrm{span}}\mathrm{\mathcal{E}}(L) \), and

In particular, we have *h*(*c*) = 1 for every extreme cycle point *c*. Assume \( h\kern.5em \not\equiv 1 \). First, we will restrict our attention to *t* ∈ *I* := [*a*, *b*] with \( a\le \frac{ \min L}{R-1} \), \( b\ge \frac{ \max L}{R-1} \), and note that *g* _{ l }(*I*) ⊂ *I* for all *l* ∈ *L*. Let *m* = min_{ t ∈ I } *h*(*t*). Then let *h*′ = *h* − *m* assume *m* < 1. Then *Rh*′(*t*) = *h*′(*t*) for all *t* ∈ ℝ, *h*′ has a zero in *I* and *h* ≥ 0 on *I*, *h*′(*z* _{0}) = 0. But this implies that |*m* _{ B }(*g* _{ l }(*z* _{0}))|^{2} *h*′(*g* _{ l }(*z* _{0})) = 0 for all *l* ∈ *L*. Since ∑ _{ l ∈ L }|*m* _{ B }(*g* _{ l }(*z* _{0}))|^{2} = 1, it follows that for one of the *l* _{0} ∈ *L*, we have \( {h}^{\hbox{'}}\left({g}_{l_0}\left({z}_0\right)\right)=0 \). By induction, we can find \( {z}_n={g}_{l_{n-1}}\cdots {g}_{l_0}{z}_0 \) such that *h*′(*z* _{ n }) = 0. We prove that *z* _{0} is a cycle point. Suppose not. Since *m* _{ B } has finitely many zeros, for *n* large enough \( {g}_{\alpha_k}\cdots {g}_{\alpha_1}{z}_n \) is not a zero for *m* _{ B }, for any choice of digits *α* _{1}, …, *α* _{ k } in *L*. But then, by using the same argument as above, we get that \( {h}^{\hbox{'}}\left({g}_{\alpha_k}\cdots \kern.3em {g}_{\alpha_1}{z}_n\right)=0 \) for any *α* _{1}, …, *α* _{ k } ∈ *L*. The points \( \left\{{g}_{\alpha_k}\cdots \kern.3em {g}_{\alpha_1}{z}_n:{\alpha}_1,\dots {\alpha}_k\in L,k\in \mathrm{\mathbb{N}}\right\} \) are dense in the attractor *X* _{ L } of the IFS {*g* _{ l }}_{ l ∈ L }; thus, *h*′ is constant 0 on *X* _{ L }. But the extreme cycle points *c* are in *X* _{ L }, and since *h*(*c*) = 1, we have 0 = *h*′(*c*) = 1 −*m*, so *m* = 1. Thus, *h* = 1 on *I*. Since we can let *a* → −*∞* and *b* →*∞*, we obtain that *h* ≡ 1.

### Remark 8

*L*) are piecewise exponential. The formula for \( {S}_{l_1\dots {l}_n}{e}_{-c}\kern0.1em \) is

*α*(

*b*,

*l*,

*c*) = −[

*b*

_{1}

*l*

_{2}+ (

*Rb*

_{1}+

*b*

_{2})

*l*

_{3}+ … + (

*R*

^{ n−2}

*b*

_{1}+ … +

*b*

_{ n−1})

*l*

_{ n }] + (

*R*

^{ n−1}

*b*

_{1}+ … +

*b*

_{ n })⋅

*c*if \( x\in {\tau}_{b_1}\dots {\tau}_{b_n}{X}_B \). We have

The rest follows from a direct computation.

### Corollary 9

Dutkay et al. (2012) *In the hypothesis of Theorem 1, if in addition B*, *L* ⊂ ℤ *and R* ∈ ℤ*, then there exists a set Λ such that* {*e* _{ λ } : *λ* ∈ *Λ*} *is an orthonormal basis for L* ^{2}(*μ* _{ B })*.*

### Proof

If everything is an integer then, it follows from Remark 8 that *S* _{ w } *e* _{− c } is an exponential function for all *w* and extreme cycle points *c*. Note that, as in the proof of Theorem 1, *bc* ∈ ℤ for all *b* ∈ *B*.

### Example 10

Dutkay et al. (2012) We consider the IFS that generates the middle third Cantor set: *R* = 3, *B* = {0, 2}. The set \( \frac{1}{3}\left\{0,2\right\} \) has spectrum *L* = {0, 3/4}. We look for the extreme cycle points for (*B*, *L*).

We need |*m* _{ B }(−*c*)| = 1 so \( \left|\frac{1+{e}^{2\pi i2c}}{2}\right|=1 \); therefore, \( c\in \frac{1}{2}\mathrm{\mathbb{Z}} \). Also *c* has to be a cycle for the IFS *g* _{0}(*x*) = *x*/3, \( {g}_{3/4}(x)=\frac{x+3/4}{3} \) so \( 0\le c\le \frac{3/4}{3-1}=3/8 \). Thus, the only extreme cycle is {0}. By Theorem 1, ℰ = {*S* _{ w }1 : *w* ∈ {0, 3/4}*} is an orthonormal basis for *L* ^{2}(*μ* _{ B }). Note also that the numbers *e* ^{2πiα(b,l,c)} in formula (Eq. 1) are ± 1 because 2*πiB* ⋅ *L* ⊂ *πi*ℤ.

#### Walsh Bases

In the following, we will focus on the unit interval, which can be regarded as the attractor of a simple IFS and we use step functions for the QMF basis to generate Walsh-type bases for *L* ^{2}[0, 1] (Dutkay et al. 2012).

### Example 11

Dutkay et al. (2012) The interval [0, 1] is the attractor of the IFS \( {\tau}_0x=\frac{x}{2},{\tau}_1x=\frac{x+1}{2} \), and the invariant measure is the Lebesgue measure on [0, 1]. The map *r* defined in Example 3 is *rx* = 2*x* mod 1. Let *m* _{0} = 1, *m* _{1} = *χ* _{[0,1/2)} − *χ* _{[1/2,1)}. It is easy to see that {*m* _{0}, *m* _{1}} is a QMF basis. Therefore, *S* _{0}, *S* _{1}, defined as in Proposition 5, form a representation of the Cuntz algebra \( {\mathcal{O}}_2 \).

### Proposition 12

Dutkay et al. (2012) *The set* ℰ := {*S* _{ w }1 : *w* ∈ {0, 1}*} *is an orthonormal basis for L* ^{2}[0, 1]*, the Walsh basis.*

### Proof

*S*

_{0}1 = 1. Define

*f*(

*t*) =

*e*

_{ t },

*t*∈ ℝ. (ii) is clear. For (iii), we compute

Thus, (iii) holds with \( {\mathfrak{m}}_0(t)=\frac{1}{2}\left(1+{e}^{2\pi it/2}\right) \), \( {\mathfrak{m}}_1(t)=\frac{1}{2}\left(1-{e}^{2\pi it/2}\right) \), \( {g}_0(t)={g}_1(t)=\frac{t}{2} \). Since *e* _{0} = 1, it follows that (iv) holds.

*h*continuous on ℝ, 0 ≤

*h*≤ 1,

*h*(

*c*) = 1, for all

*c*∈ ℝ with \( {e}_t\in \overline{\mathrm{span}}\mathrm{\mathcal{E}} \), in particular

*h*(0) = 1 and

*h*(

*t*) =

*h*(

*t*/2

^{ n }) for all

*t*∈ ℝ,

*n*∈ ℕ. Letting

*n*→

*∞*and using the continuity of

*h*, we get

*h*(

*t*) =

*h*(0) = 1 for all

*t*∈ ℝ. Since all conditions hold, we get that ℰ is an orthonormal basis. That ℰ is actually the Walsh basis follows from the following calculations: for |

*w*| =

*n*in {0, 1}*, let \( n={\displaystyle \sum_i}{x}_i{2}^i \) be the base 2 expansion of

*n*. Because

*S*

_{0}

*f*=

*f*∘

*r*,

*S*

_{1}

*f*=

*m*

_{1}

*f*∘

*r*, and

*m*

_{0}≡ 1, we obtain the following decomposition:

Also *m* _{1}(*r* ^{ i } *x*) = *m* _{1}(2^{ i } *x* mod *i*) are the Rademacher functions, and thus, we obtain the Walsh basis (see, e.g., Schipp et al. 1990).

*m*

_{0},

*m*

_{1}, with an arbitrary unitary matrix

*A*with constant first row and by changing the scale from 2 to

*N*.

### Theorem 13

*Let N*∈ ℕ,

*N*≥ 2

*. Let A*= [

*a*

_{ ij }]

*be an N*×

*N unitary matrix whose first row is constant*\( \frac{1}{\sqrt{N}} \)

*. Consider the IFS*\( {\tau}_jx=\frac{x+j}{N},x\in \mathrm{\mathbb{R}},j=0,\dots, N-1 \)

*with the attractor*[0, 1]

*and invariant measure, the Lebesgue measure, on*[0, 1]

*. Define*

*Then* {*m* _{ i }} _{ i = 0} ^{ N − 1} *is a QMF basis. Consider the associated representation of the Cuntz algebra* \( {\mathcal{O}}_N \) *. Then the set* ℰ := {*S* _{ w }1 : *w* ∈ {0, … *N* − 1}*} *is an orthonormal basis for L* ^{2}[0, 1]*.*

### Proof

We check the conditions in Theorem 1. Let *f*(*t*) = *e* _{ t }, *t* ∈ ℝ.

*S*

_{0}1 ≡ 1. (ii) is clear. For (iii), we compute

So (iii) is true with \( {\mathfrak{m}}_k(t)=\frac{1}{\sqrt{N}}{\displaystyle \sum_{j=0}^{N-1}}\overline{a_{kj}}{e}^{2\pi it\cdot j/N} \) and \( {g}_k(t)=\frac{t}{N} \).

*c*

_{0}= 0. For (v), take \( h\in \mathcal{C}\left(\mathrm{\mathbb{R}}\right),0\le h\le 1,h(c)=1 \), for all

*c*∈ ℝ with \( {e}_c\in \overline{\mathrm{span}}\mathrm{\mathcal{E}} \) (in particular

*h*(0) = 1), and

Where *v* = (*e* ^{− 2πit ⋅ j/N }) _{ j = 0} ^{ N − 1} . Since *A* is unitary, ||*Av*||^{2} = ||*v*||^{2} = *N*. Then *h*(*t*) = *h*(*t*/*N* ^{ n }). Letting *n* → *∞* and using the continuity of *h*, we obtain that *h*(*t*) = 1 for all *t* ∈ ℝ. Thus, Theorem 1 implies that ℰ is an orthonormal basis.

### Remark 14

Dutkay et al. (2012) We can read the constants that appear in the step function *S* _{ w }1 from the tensor of *A* with itself *n* times, where *n* is the length of the word *w*.

*A*be an

*N*×

*N*matrix and

*B*an

*M*×

*M*matrix. Then

*A*⊗

*B*has entries:

The matrix *A* ^{⊗n } is obtained by induction, tensoring to the left: *A* ^{⊗n } = *A* ⊗ *A* ^{⊗ (n−1)}.

*A*⊗

*A*⊗

*A*⊗ … ⊗

*A*,

*n*times, has entries

*i*

_{0}, …

*i*

_{ n−1}∈ {0, …,

*N*− 1}:

Suppose \( x\in \left[\frac{k}{N^n},\frac{k+1}{N^n}\right),0\le k<{N}^n \) and *k* = *N* ^{ n − 1} *j* _{0} + *N* ^{ n − 2} *j* _{1} + … + *Nj* _{ n−2} + *j* _{ n−1}, where 0 ≤ *j* _{0}, …, *j* _{ n − 1} < *N*.

### Example 15

*N*= 4 and the matrix

## List of Names and Discoveries

| Expressing functions as sums of sine and cosine waves of frequencies in arithmetic progression (now called Fourier series) |

Jean Baptiste Joseph Fourier: mathematics, physics (heat conduction) | |

1909 | Discovered, while a student of David Hilbert, an orthonormal basis consisting of step functions, applicable both to functions on an interval and functions on the whole real line. While it was not realized at the time, Haar’s construction was a precursor of what is now known as the Mallat subdivision and multiresolution method, as well as the subdivision wavelet algorithms |

Alfred Haar: mathematics | |

1946 | Discovered basis expansions for what might now be called time frequency wavelets, as opposed to time-scale wavelets |

Denes Gabor (Nobel Prize): physics (optics, holography) | |

| A rigorous formula used by the phone company for sampling speech signals. Quantizing information and entropy and founder of what is now called the mathematical theory of communication |

Claude Elwood Shannon: mathematics, engineering (information theory) | |

| Discovered subband coding of digital transmission of speech signals over the telephone |

Claude Garland, Daniel Esteban (both): signal processing | |

| Suggested the term “ondelettes.” J. M. decomposed reflected seismic signals into sums of “wavelets (Fr. ondelettes) of constant shape,” i.e., a decomposition of signals into wavelet shapes, selected from a library of such shapes (now called wavelet series). Received somewhat late recognition for his work. Due to contributions by A. Grossman and Y. Meyer, Morlet’s discoveries have now come to play a central role in the theory |

Jean Morlet: petroleum engineer | |

| Mentor for A. Cohen, S. Mallat, and others of the wavelet pioneers, Y. M. discovered infinitely often differentiable wavelets |

Yves Meyer: mathematics, applications | |

| Discovered the use of wavelet filters in the analysis of wavelets – the so-called Cohen condition for orthogonality |

Albert Cohen: mathematics (orthogonality relations), numerical analysis | |

| Discovered what is now known as the subdivision and multiresolution method, as well as the subdivision wavelet algorithms. This allowed the effective use of operators in the Hilbert space L2(R) and of the parallel computational use of recursive matrix algorithms |

Stephane Mallat: mathematics, signal and image processing | |

| Discovered differentiable wavelets, with the number of derivatives roughly half the length of the support interval. Further found polynomial algorithmic for their construction (with coauthor Jeff Lagarias, joint spectral radius formulas) |

Ingrid Daubechies: mathematics, physics, and communications | |

| Discovered the use of a transfer operator in the analysis of wavelets: orthogonality and smoothness |

Wayne Lawton: mathematics (the wavelet transfer operator) | |

| C. Brislawn and his group at Los Alamos created the theory and the codes which allowed the compression of the enormous FBI fingerprint file, creating A/D, a new database of fingerprints |

The FBI using wavelet algorithms in digitizing and compressing fingerprints | |

| A wavelet-based picture compression standard, called JPEG 2000, for digital encoding of images |

The International Standards Organization | |

| Pioneered the use of wavelet bases and tools from statistics to “denoise” images and signals |

David Donoho: statistics, mathematics |

## History

While wavelets as they have appeared in the mathematics literature (e.g., Daubechies 1992) for a long time, starting with Haar in 1909, involve function spaces, the connections to a host of discrete problems from engineering are more subtle. Moreover, the deeper connections between the discrete algorithms and the function spaces of mathematical analysis are of a more recent vintage; see, e.g., Strang and Nguyen (1996) and Jorgensen (2006a).

Here we begin with the function spaces. This part of wavelet theory refers to continuous wavelet transforms (details below). It dominated the wavelet literature in the 1980s and is beautifully treated in the first four chapters in Daubechies (1992)) and in Daubechies (1993). The word “continuous” refers to the continuum of the real line ℝ. Here we consider spaces of functions in one or more real dimensions, i.e., functions on the line ℝ (signals), the plane ℝ^{2} (images), or, in higher dimensions ℝ^{ d }, functions of *d* real variables.

## Literature

As evidenced by a simple Google check, the mathematical wavelet literature is gigantic in size, and the manifold applications spread over a vast number of engineering journals. While we cannot do justice to this volumest literature, we instead offer a collection of the classics (Heil and Walnut 2006) edited recently by C. Heil et al.

## Notes

### Acknowledgments

We thank Professors Dorin Dutkay, Gabriel Picioroaga, and Judy Packer for the helpful discussions.

## Bibliography

- Aubert G, Kornprobst P (2006) Mathematical problems in image processing. Springer, New YorkzbMATHGoogle Scholar
- Baggett L, Jorgensen P, Merrill K, Packer J (2005) A non-MRA Cr frame wavelet with rapid decay. Acta Appl Math 89:251–270CrossRefzbMATHMathSciNetGoogle Scholar
- Baladi V (2000) Positive transfer operators and decay of correlations, vol 16, Advanced series in nonlinear dynamics. World Scientific, River EdgezbMATHGoogle Scholar
- Bratelli O, Jorgensen P (2002) Wavelets through a looking glass: the world of the spectrum. Birkhäuser, BostonCrossRefGoogle Scholar
- Braverman M (2006) Parabolic Julia sets are polynomial time computable. Nonlinearity 19(6):1383–1401ADSCrossRefzbMATHMathSciNetGoogle Scholar
- Braverman M, Yampolsky M (2006) Non-computable Julia sets. J Am Math Soc 19(3):551–578 (electronic)CrossRefzbMATHMathSciNetGoogle Scholar
- Bredies K, Lorenz DA, Maass P (2006) An optimal control problem in medical image processingGoogle Scholar
- Daubechies I (1992) Ten lectures on wavelets, vol 61, CBMS-NSF regional conference series in applied mathematics. Society for Industrial and Applied Mathematics, PhiladelphiaCrossRefzbMATHGoogle Scholar
- Daubechies I (1993) Wavelet transforms and orthonormal wavelet bases. Proc Sympos Appl MathGoogle Scholar
- Devaney RL, Look DM (2006) A criterion for Sierpinski curve Julia sets. Topol Proc 30(1):163–179, Spring topology and dynamical systems conferencezbMATHMathSciNetGoogle Scholar
- Devaney RL, Rocha MM, Siegmund S (2007) Rational maps with generalized Sierpinski gasket Julia sets. Topol Appl 154(1):11–27CrossRefzbMATHMathSciNetGoogle Scholar
- Dutkay DE (2004) The spectrum of the wavelet Galerkin operator. Integral Equ Oper Theory 50:477–487ADSCrossRefzbMATHMathSciNetGoogle Scholar
- Dutkay DE, Jorgensen PET (2005) Wavelet constructions in non-linear dynamics. Electron Res Announc Am Math Soc 11:21–23CrossRefzbMATHMathSciNetGoogle Scholar
- Dutkay DE, Jorgensen PET (2006a) Wavelets on fractals. Rev Mat Iberoamericana 22:131–180CrossRefzbMATHMathSciNetGoogle Scholar
- Dutkay DE, Jorgensen PET (2006b) Hilbert spaces built on a similarity and on dynamical renormalization. J Math Phys 47:053504ADSCrossRefMathSciNetGoogle Scholar
- Dutkay DE, Jorgensen PET (2006c) Iterated function systems, Ruelle operators, and invariant projective measures. Math Comput 75:1931ADSCrossRefzbMATHMathSciNetGoogle Scholar
- DE Dutkay, K Roysland (2007) The algebra of harmonic functions for a matrix-valued transfer operator. arXiv:math/0611539Google Scholar
- Dutkay DE, Roysland K (2007) Covariant representations for matrix-valued transfer operators. arXiv:math/0701453Google Scholar
- Dutkay DE, Picioroaga G, M-S Song (2012) Orthonormal bases generated by Cuntz algebras. arXiv:1212.4134Google Scholar
- Heil C, Walnut DF (eds) (2006) Fundamental papers in wavelet theory. Princeton University Press, PrincetonzbMATHGoogle Scholar
- Hutchinson JE (1981) Fractals and self-similarity. Indiana Univ Math J 30(5):713–747CrossRefzbMATHMathSciNetGoogle Scholar
- Daubechies I, Lagarias JC (1992) Two-scale difference equations. II. Local regularity, infinite products of matrices and fractals. SIAM J Math AnalGoogle Scholar
- Jorgensen PET (2003) Matrix factorizations, algorithms, wavelets. Not Am Math Soc 50:880–895zbMATHGoogle Scholar
- Jorgensen PET (2006a) Analysis and probability: wavelets, signals, fractals, vol 234, Graduate texts in mathematics. Springer, New YorkGoogle Scholar
- Jorgensen T (2006b) Certain representations of the Cuntz relations, and a question on wavelets decompositions. Contemp Math 414:165–188CrossRefGoogle Scholar
- Liu F (2006) Diffusion filtering in image processing based on wavelet transform. Sci China Ser F 49:1–25CrossRefMathSciNetGoogle Scholar
- Milnor J (2004) Pasting together Julia sets: a worked out example of mating. Exp Math 13(1):55–92CrossRefzbMATHMathSciNetGoogle Scholar
- Petersen CL, Zakeri S (2004) On the Julia set of a typical quadratic polynomial with a Siegel disk. Ann Math 159(1):1–52CrossRefzbMATHMathSciNetGoogle Scholar
- Schipp F, Wade WR, Simon P (1990) Walsh series. Adam Hilger Ltd., Bristol. An introduction to dyadic harmonic analysis, With the collaboration of J. PálGoogle Scholar
- Skodras A, Christopoulos C, Ebrahimi T (2001) Jpeg 2000 still image compression standard. IEEE Signal Process Mag 18:36–58ADSCrossRefGoogle Scholar
- Song M-S (2006) Wavelet image compression. PhD thesis, The University of IowaGoogle Scholar
- Song M-S (2006b) Wavelet image compression. In: Operator theory, operator algebras, and applications, vol 414, Contemporary mathematics. American Mathematical Society, Providence, pp 41–73CrossRefGoogle Scholar
- Strang G (1997) Wavelets from filter banks. Springer, New YorkGoogle Scholar
- Strang G (2000) Signal processing for everyone. Lecture notes in mathematics, Springer, vol 1739Google Scholar
- Strang G, Nguyen T (1996) Wavelets and filter banks. Wellesley-Cambridge Press, WellesleyzbMATHGoogle Scholar
- Usevitch BE (2001) A tutorial on modern lossy wavelet image compression: foundations of JPEG 2000. IEEE Signal Process Mag 18:22–35ADSCrossRefGoogle Scholar
- Walker JS (1999) A primer on wavelets and their scientific applications. Chapman & Hall/CRC, Boca RatonCrossRefzbMATHGoogle Scholar