# Encyclopedia of Complexity and Systems Science

Living Edition
| Editors: Robert A. Meyers

# Comparison of Discrete and Continuous Wavelet Transforms

• Palle E. T. Jorgensen
• Myung-Sin Song
Living reference work entry
DOI: https://doi.org/10.1007/978-3-642-27737-5_77-2

## Keywords

Hilbert Space Wavelet Transform Continuous Wavelet Transform Continuous Wavelet Multiresolution Analysis
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Our purpose is to outline a number of direct links between the two cases of wavelet analysis: continuous and discrete. The theme of the first is perhaps best known, for example, the creation of compactly supported wavelets in L 2(ℝ n ) with suitable properties such as localization, vanishing moments, and differentiability. The second (discrete) deals with computation, with sparse matrices, and with algorithms for encoding digitized information such as speech and images. This is centered on constructive approaches to subdivision filters, their matrix representation (by sparse matrices), and corresponding fast algorithms. For both approaches, we outline computational transforms; but our emphasis is on effective and direct links between computational analysis of discrete filters on the one side and on continuous wavelets on the other. By the latter, we include both L 2(ℝ n ) analysis and fractal analysis. To facilitate the discussion of the interplay between discrete (used by engineers) and continuous (harmonic analysis), we include a list of terminology commonly used in the two areas; and we include comments on translation between them.

Multiresolutions. Haar’s work from 1909 to 1910 implicitly had the key idea which got wavelet mathematics started on a roll 75 years later with Yves Meyer, Ingrid Daubechies, Stéphane Mallat, and others – namely, the idea of a multiresolution. In that respect Haar was ahead of his time. See Figs. 1 and 2 for details.
$$\cdots \subset {V}_{-1}\subset {V}_0\subset {V}_1\subset \cdots, \kern0.36em {V}_0+{W}_0={V}_1$$

The word “multiresolution” suggests a connection to optics from physics. So that should have been a hint to mathematicians to take a closer look at trends in signal and image processing! Moreover, even staying within mathematics, it turns out that as a general notion, this same idea of a “multiresolution” has long roots in mathematics, even in such modern and pure areas as operator theory and Hilbert space geometry. Looking even closer at these interconnections, we can now recognize scales of subspaces (so-called multiresolutions) in classical algorithmic construction of orthogonal bases in inner-product spaces, now taught in lots of mathematics courses under the name of the Gram–Schmidt algorithm. Indeed, a closer look at good old Gram–Schmidt reveals that it is a matrix algorithm, hence new mathematical tools involving non-commutativity!

If the signal to be analyzed is an image, then why not select a fixed but suitable resolution (or a subspace of signals corresponding to a selected resolution) and then do the computations there? The selection of a fixed “resolution” is dictated by practical concerns. That idea was key in turning the computation of wavelet coefficients into iterated matrix algorithms. As the matrix operations get large, the computation is carried out in a variety of paths arising from big matrix products. The dichotomy, continuous vs. discrete, is quite familiar to engineers. The industrial engineers typically work with huge volumes of numbers.

Numbers! – so why wavelets? Well, what matters to the industrial engineer is not really the wavelets, but the fact that special wavelet functions serve as an efficient way to encode large data sets – I mean encode for computations. And the wavelet algorithms are computational. They work on numbers. Encoding numbers into pictures, images, or graphs of functions comes later, perhaps at the very end of the computation. But without the graphics, I doubt that we would understand any of this half as well as we do now. The same can be said for the many issues that relate to the crucial mathematical concept of self-similarity, as we know it from fractals and more generally from recursive algorithms.

## Glossary

This Glossary consists of a list of terms used in this entry: in mathematics, in probability, in engineering, and on occasion in physics. To clarify the seemingly confusing use of up to four different names for the same idea or concept, we have further added informal explanations spelling out the reasons behind the differences in current terminology from neighboring fields.

Disclaimer : This glossary has the structure of four columns. A number of terms are listed line by line, and each line is followed by explanation. Some “terms” have up to four separate (yet commonly accepted) names.

Mathematics

Probability

Engineering

Physics

Function (measurable)

Random variable

Signal

State

Mathematically, functions may map between any two sets, say, from X to Y; but if X is a probability space (typically called Ω), it comes with a σ-algebra ℬ of measurable sets and probability measure P. Elements E in ℬ are called events and P(E) the probability of E. Corresponding measurable functions with values in a vector space are called random variables, a terminology which suggests a stochastic viewpoint. The function values of a random variable may represent the outcomes of an experiment, for example, “throwing of a die.”

Yet function theory is widely used in engineering where functions are typically thought of as signal. In this case, X may be the real line for time, or ℝ d . Engineers visualize functions as signals. A particular signal may have a stochastic component, and this feature simply introduces an extra stochastic variable into the “signal,” for example, noise.

Turning to physics, in our present application, the physical functions will be typically be in some L 2 space, and L 2 functions with unit norm represent quantum mechanical “states.”

Sequence (incl. vector-valued)

Random walk

Time-series

Measurement

Mathematically, a sequence is a function defined on the integers ℤ or on subsets of ℤ, for example, the natural numbers ℕ. Hence, if time is discrete, this to the engineer represents a time series, such as a speech signal, or any measurement which depends on time. But we will also allow functions on lattices such as ℤ d .

In the case d = 2, we may be considering the grayscale numbers which represent exposure in a digital camera. In this case, the function (gray scale) is defined on a subset of ℤ2 and is then simply a matrix.

A random walk on ℤ d is an assignment of a sequential and random motion as a function of time. The randomness presupposes assigned probabilities. But we will use the term “random walk” also in connection with random walks on combinatorial trees.

Nested subspaces

Refinement

Multiresolution

Scales of visual resolutions

While finite or infinite families of nested subspaces are ubiquitous in mathematics and have been popular in Hilbert space theory for generations (at least since the 1930s), this idea was revived in a different guise in 1986 by Stéphane Mallat, then an engineering graduate student. In its adaptation to wavelets, the idea is now referred to as the multiresolution method.

What made the idea especially popular in the wavelet community was that it offered a skeleton on which various discrete algorithms in applied mathematics could be attached and turned into wavelet constructions in harmonic analysis. In fact what we now call multiresolutions have come to signify a crucial link between the world of discrete wavelet algorithms, which are popular in computational mathematics and in engineering (signal/image processing, data mining, etc.) on the one side and on the other side continuous wavelet bases in function spaces, especially in L 2(ℝ d ). Further, the multiresolution idea closely mimics how fractals are analyzed with the use of finite function systems.

But in mathematics, or more precisely in operator theory, the underlying idea dates back to the work of John von Neumann, Norbert Wiener, and Herman Wold, where nested and closed subspaces in Hilbert space were used extensively in an axiomatic approach to stationary processes, especially for time series. Wold proved that any (stationary) time series can be decomposed into two different parts: The first (deterministic) part can be exactly described by a linear combination of its own past, while the second part is the opposite extreme; it is unitary, in the language of von Neumann.

Von Neumann’s version of the same theorem is a pillar in operator theory. It states that every isometry in a Hilbert space ℋ is the unique sum of a shift isometry and a unitary operator, i.e., the initial Hilbert space ℋ splits canonically as an orthogonal sum of two subspaces ℋ s and ℋ u in ℋ: one which carries the shift operator and the other ℋ u the unitary part. The shift isometry is defined from a nested scale of closed spaces V n , such that the intersection of these spaces is ℋ u . Specifically,

$$\begin{array}{c}\cdots \subset {V}_{-1}\subset {V}_0\subset {V}_1\subset {V}_2\subset \cdots \subset {V}_n\subset {V}_{n+1}\subset \cdots \\ {}\underset{n}{\varLambda }{V}_n={\mathrm{\mathcal{H}}}_u,\kern0.5em \mathrm{and}\kern0.5em \underset{n}{\varLambda }{V}_n=\mathrm{\mathcal{H}}.\end{array}$$

However, Stéphane Mallat was motivated instead by the notion of scales of resolutions in the sense of optics. This in turn is based on a certain “artificial-intelligence” approach to vision and optics, developed earlier by David Marr at MIT, an approach which imitates the mechanism of vision in the human eye.

The connection from these developments in the 1980s back to von Neumann is this: Each of the closed subspaces V n corresponds to a level of resolution in such a way that a larger subspace represents a finer resolution. Resolutions are relative, not absolute! In this view, the relative complement of the smaller (or coarser) subspace in larger space then represents the visual detail which is added in passing from a blurred image to a finer one, i.e., to a finer visual resolution.

This view became an instant hit in the wavelet community, as it offered a repository for the fundamental father and the mother functions, also called the scaling function φ and the wavelet function ψ. Via a system of translation and scaling operators, these functions then generate nested subspaces, and we recover the scaling identities which initialize the appropriate algorithms. What results is now called the family of pyramid algorithms in wavelet analysis. The approach itself is called the multiresolution approach (MRA) to wavelets. And in the meantime various generalizations (GMRAs) have emerged.

In all of this, there was a second “accident” at play: As it turned out, pyramid algorithms in wavelet analysis now lend themselves via multiresolutions, or nested scales of closed subspaces, to an analysis based on frequency bands. Here we refer to bands of frequencies as they have already been used for a long time in signal processing.

One reason for the success in varied disciplines of the same geometric idea is perhaps that it is closely modeled on how we historically have represented numbers in the positional number system. Analogies to the Euclidean algorithm seem especially compelling.

Operator

Process

Black box

In linear algebra students are familiar with the distinctions between (linear) transformations T (here called “operators”) and matrices. For a fixed operator T : VW, there is a variety of matrices, one for each choice of basis in V and in W. In many engineering applications, the transformations are not restricted to be linear, but instead represent some experiment (“black box,” in Norbert Wiener’s terminology), one with an input and an output, usually functions of time. The input could be an external voltage function, the black box an electric circuit, and the output the resulting voltage in the circuit. (The output is a solution to a differential equation.)

This context is somewhat different from that of quantum mechanical (QM) operators T : VV where V is a Hilbert space. In QM, self-adjoint operators represent observables such as position Q and momentum P, or time and energy.

Fourier dual pair

Generating function

Time/frequency

P/Q

The following dual pairs position Q/momentum P, and time/energy may be computed with the use of Fourier series or Fourier transforms; and in this sense they are examples of Fourier dual pairs. If, for example, time is discrete, then frequency may be represented by numbers in the interval [ 0, 2π) or in [ 0, 1) if we enter the number 2π into the Fourier exponential. Functions of the frequency are then periodic, so the two endpoints are identified. In the case of the interval [ 0, 1), 0 on the left is identified with 1 on the right. So a low-frequency band is an interval centered at 0, while a high-frequency band is an interval centered at 1/2. Let a function W on [ 0, 1) represent a probability assignment. Such functions W are thought of as “filters” in signal processing. We say that W is low pass if it is 1 at 0 or if it is near 1 for frequencies near 0. Low-pass filters pass signals with low frequencies and block the others.

If instead some filter W is 1 at 1/2 or takes values near 1 for frequencies near 1/2, then we say that W is high pass; it passes signals with high frequency.

Convolution

Filter

Smearing

Pointwise multiplication of functions of frequencies corresponds in the Fourier dual time domain to the operation of convolution (or of Cauchy product if the time scale is discrete). The process of modifying a signal with a fixed convolution is called a linear filter in signal processing. The corresponding Fourier dual frequency function is then referred to as “frequency response” or the “frequency response function.”

More generally, in the continuous case, since convolution tends to improve smoothness of functions, physicists call it “smearing.”

Decomposition (e.g., Fourier coefficients in a Fourier expansion) components

Analysis

Frequency

Calculating the Fourier coefficients is “analysis,” and adding up the pure frequencies (i.e., summing the Fourier series) is called synthesis. But this view carries over more generally to engineering where there are more operations involved on the two sides, e.g., breaking up a signal into its frequency bands, transforming further, and then adding up the “banded” functions in the end. If the signal out is the same as the signal in, we say that the analysis/synthesis yields perfect reconstruction.

Integrate (e.g., inverse Fourier transform)

Reconstruct

Synthesis

Superposition

Here the terms related to “synthesis” refer to the second half of the kind of signal-processing design outlined in the previous paragraph.

Subspace

Resolution

(Signals in a) frequency band

For a space of functions (signals), the selection of certain frequencies serves as a way of selecting special signals. When the process of scaling is introduced into optics of a digital camera, we note that a nested family of subspaces corresponds to a grading of visual resolutions.

Cuntz relations

Perfect reconstruction from subbands

Subband decomposition

$${\displaystyle \sum_{i=0}^{N-1}}{S}_i{S}_i^{*}=1,\kern0.5em \mathrm{and}\kern0.5em {S}_i^{*}{S}_j={\delta}_{i,j}1.$$

Inner product

Correlation

Transition probability

Probability of transition from one state to another

In many applications, a vector space with inner product captures perfectly the geometric and probabilistic features of the situation. This can be axiomatized in the language of Hilbert space; and the inner product is the most crucial ingredient in the familiar axiom system for Hilbert space.

f out = Tf in

Input/output

Transformation of states

Systems theory language for operators T : VW where vectors in V are input and in the range of T are output.

Fractal

Intuitively, think of a fractal as reflecting similarity of scales such as is seen in fernlike images that look “roughly” the same at small and at large scales. Fractals are produced from an infinite iteration of a finite set of maps, and this algorithm is perfectly suited to the kind of subdivision which is a cornerstone of the discrete wavelet algorithm. Self-similarity could refer alternately to space and to time. And further versatility is added, in that flexibility is allowed into the definition of “similar.”

Data mining

The problem of how to handle and make use of large volumes of data is a corollary of the digital revolution. As a result, the subject of data mining itself changes rapidly. Digitized information (data) is now easy to capture automatically and to store electronically. In science, in commerce, and in industry, data represents collected observations and information: In business, there is data on markets, competitors, and customers. In manufacturing, there is data for optimizing production opportunities and for improving processes. A tremendous potential for data mining exists in medicine, genetics, and energy. But raw data is not always directly usable, as is evident by inspection. A key to advances is our ability to extract information and knowledge from the data (hence “data mining”) and to understand the phenomena governing data sources. Data mining is now taught in a variety of forms in engineering departments, as well as in statistics and computer science departments.

One of the structures often hidden in data sets is some degree of scale. The goal is to detect and identify one or more natural global and local scales in the data. Once this is done, it is often possible to detect associated similarities of scale, much like the familiar scale similarity from multidimensional wavelets and from fractals. Indeed, various adaptations of wavelet-like algorithms have been shown to be useful. These algorithms themselves are useful in detecting scale similarities and are applicable to other types of pattern recognition. Hence, in this context, generalized multiresolutions offer another tool for discovering structures in large data sets, such as those stored in the resources of the Internet. Because of the sheer volume of data involved, a strictly manual analysis is out of the question. Instead, sophisticated query processors based on statistical and mathematical techniques are used in generating insights and extracting conclusions from data sets.

## Definition

In this entry we outline several points of view on the interplay between discrete and continuous wavelet transforms, stressing both pure and applied aspects of both. We outline some new links between the two transform technologies based on the theory of representations of generators and relations. By this, we mean a finite system of generators which are represented by operators in Hilbert space. We further outline how these representations yield subband filter banks for signal- and image-processing algorithms.

The word “wavelet transform” (WT) means different things to different people: Pure and applied mathematicians typically give different answers the question “What is the WT?” And engineers in turn have their own preferred quite different approach to WTs. Still there are two main trends in how WTs are used: the continuous WT on one side and the discrete WT on the other. Here we offer a user-friendly outline of both but with a slant toward geometric methods from the theory of operators in Hilbert space.

Our entry is organized as follows: For the benefit of diverse reader groups, we begin with section “Glossary.” This is a substantial part of our account, and it reflects the multiplicity of how the subject is used.

The concept of multiresolutions or multiresolution analysis (MRA) serves as a link between the discrete and continuous theory.

In section “List of Names and Discoveries,” we summarize how different mathematicians and scientists have contributed to and shaped the subject over the years.

The next two sections then offer a technical overview of both discrete and continuous WTs. This includes basic tools from Fourier analysis and from operators in Hilbert space. In sections “Tools from Mathematics” and “A Transfer Operator,” we outline the connections between the separate parts of mathematics and their applications to WTs.

## Introduction

While applied problems such as time series, signals, and processing of digital images come from engineering and from the sciences, they have in the past two decades taken a life of their own as an exciting new area of applied mathematics. While searches in Google on these keywords typically yield sites numbered in the millions, the diversity of applications is wide, and it seems reasonable here to narrow our focus to some of the approaches that are both more mathematical and more recent. For references, see, for example, Aubert and Kornprobst (2006), Bredies et al. (2006), Liu (2006), Strang and Nguyen (1996). In addition, our own interests (e.g., Jorgensen 2003, 2006a; Song 2006a, b) have colored the presentation below. Each of the two areas, the discrete side and the continuous theory, is huge as measured by recent journal publications. A leading theme in our entry is the independent interest in a multitude of interconnections between the discrete algorithm and their uses in the more mathematical analysis of function spaces (continuous wavelet transforms). The mathematics involved in the study and the applications of this interaction we feel is of benefit to both mathematicians and to engineers. See also (Jorgensen 2003). An early paper (Daubechies and Lagarias 1992) by Daubechies and Lagarias was especially influential in connecting the two worlds, discrete and continuous.

## The Discrete Versus Continuous Wavelet Algorithms

### The Discrete Wavelet Transform

If one stays with function spaces, it is then popular to pick the d-dimensional Lebesgue measure on ℝ d , d = 1, 2,…, and pass to the Hilbert space L 2(ℝ d ) of all square integrable functions on ℝ d , referring to d-dimensional Lebesgue measure. A wavelet basis refers to a family of basis functions for L 2(ℝ d ) generated from a finite set of normalized functions ψ i , the index i chosen from a fixed and finite index set I and from two operations: one called scaling and the other translation. The scaling is typically specified by a d matrix over the integers ℤ such that all the eigenvalues in modulus are bigger than one and lie outside the closed unit disk in the complex plane. The d -lattice is denoted ℤ d , and the translations will be by vectors selected from ℤ d . We say that we have a wavelet basis if the triple indexed family ψ i,j,k (x) := |detA| j/2 ψ(A j x + k) forms an orthonormal basis (ONB) for L 2(ℝ d ) as i varies in I, j ∈ ℤ, and k ∈ ℝ d . The word “orthonormal” for a family F of vectors in a Hilbert space ℋ refers to the norm and the inner product in ℋ: The vectors in an orthonormal family F are assumed to have norm one and to be mutually orthogonal. If the family is also total (i.e., the vectors in F span a subspace which is dense in ℋ), we say that F is an orthonormal basis (ONB).

While there are other popular wavelet bases, for example, frame bases and dual bases (see, e.g., Baggett et al. (2005), Dutkay and Roysland (2007b) and the papers cited there), the ONBs are the most agreeable at least from the mathematical point of view.

That there are bases of this kind is not at all clear, and the subject of wavelets in this continuous context has gained much from its connections to the discrete world of signal and image processing.

Here we shall outline some of these connections with an emphasis on the mathematical context. So we will be stressing the theory of Hilbert space and bounded linear operators acting in Hilbert space ℋ, both individual operators and families of operators which form algebras.

As was noticed recently, the operators which specify particular subband algorithms from the discrete world of signal processing turn out to satisfy relations that were found (or rediscovered independently) in the theory of operator algebras and which go under the name of Cuntz algebras, denoted $${\mathcal{O}}_N$$ if n is the number of bands. For additional details, see, e.g., Jorgensen (2006a).

In symbols the C* − algebra has generators (S i ) i=0 N−1 , and the relations are
$${\displaystyle \sum_{i=0}^{N-1}}{S}_i{S}_i^{*}=1$$
(1)
(where 1 is the identity element in $${\mathcal{O}}_N$$) and
$${\displaystyle \sum_{i=0}^{N-1}}{S}_i{S}_i^{*}=1,\kern0.5em \mathrm{and}\kern0.5em {S}_i^{*}{S}_j={\delta}_{i,j}1.$$
(2)
In a representation on a Hilbert space, say ℋ, the symbols S i turn into bounded operators, also denoted S i , and the identity element 1 turns into the identity operator I in ℋ, i.e., the operator I : hh, for h ∈ ℋ. In operator language, the two formulas, Eqs. 1 and 2, state that each S i is an isometry in ℋ and that the respective ranges S i ℋ are mutually orthogonal, i.e., S i ℋ ⊥ S j ℋ for ij. Introducing the projections P i = S i S i * , we get P i P j = δ i,j P i , and
$${\displaystyle \sum_{i=0}^{N-1}}{P}_i=I$$

In the engineering literature this takes the form of programming diagrams.

If the process of Fig. 3 is repeated, we arrive at the discrete wavelet transform or stated in the form of images (n = 5) (Fig. 4)

Selecting a resolution subspace V 0 = closure span{φ(⋅− k)|k ∈ ℤ}, we arrive at a wavelet subdivision {ψ j,k |j ≥ 0, k ∈ ℤ}, where ψ j,k (x) = 2 j/2 ψ(2 j xk), and the continuous expansion $$f={\displaystyle \sum_{j,k}}<{\psi}_{j,k}\Big|f>{\psi}_{j,k}$$ or the discrete analogue derived from the isometries, i = 1, 2, ⋯, N − 1, S 0 k S i for k = 0, 1, 2, ⋯ called the discrete wavelet transform.

Notational convention. In algorithms, the letter N is popular and often used for counting more than one thing.

In the present context of the Discrete Wavelet Algorithm (DWA) or DWT, we count two things, “the number of times a picture is decomposed via subdivision.” We have used n for this. The other related but different number N is the number of subbands, N = 2 for the dyadic DWT and N = 4 for the image DWT. The image-processing WT in our present context is the tensor product of the 1-D dyadic WT, so 2 × 2 = 4. Caution: Not all DWAs arise as tensor products of N = 2 models. The wavelets coming from tensor products are called separable. When a particular image-processing scheme is used for generating continuous wavelets, it is not transparent if we are looking at a separable or inseparable wavelet!

To clarify the distinction, it is helpful to look at the representations of the Cuntz relations by operators in Hilbert space. We are dealing with representations of the two distinct algebras $${\mathcal{O}}_2$$ and $${\mathcal{O}}_4$$: two frequency subbands versus 4 subbands. Note that the Cuntz $${\mathcal{O}}_2$$ and $${\mathcal{O}}_4$$ are given axiomatic, or purely symbolically. It is only when subband filters are chosen that we get representations. This also means that the choice of N is made initially; and the same N is used in different runs of the programs. In contrast, the number of times a picture is decomposed varies from one experiment to the next! (Fig. 5)

Summary: N = 2 for the dyadic DWT: The operators in the representation are S 0 and S 1: one average operator and one detail operator. The detail operator S 1 “counts” local detail variations.

Image processing. Then N = 4 is fixed as we run different images in the DWT: The operators are now S 0, S H , S V , and S D – one average operator and three detail operators for local detail variations in the three directions in the plane.

### The Continuous Wavelet Transform

Consider functions f on the real line ℝ. We select the Hilbert space of functions to be L 2(ℝ). To start a continuous WT, we must select a function ψL 2(ℝ) and r, s ∈ ℝ such that the following family of functions
$${\psi}_{r,s}(x)={r}^{-1/2}\psi \left(\frac{x-s}{r}\right)$$
creates an over-complete basis for L 2(ℝ). An over-complete family of vectors in a Hilbert space is often called a coherent decomposition. This terminology comes from quantum optics. What is needed for a continuous WT in the simplest case is the following representation valid for all fL 2(ℝ):
$$f(x)={C}_{\psi}^{-1}{\displaystyle \int }{\displaystyle \underset{{\mathrm{\mathbb{R}}}^2}{\int }}<{\psi}_{r,s}\Big|f>{\psi}_{r,s}(x)\frac{ drds}{r^2}$$
where $${C}_{\psi }:={\displaystyle \underset{\mathrm{\mathbb{R}}}{\int }}\left|\widehat{\psi}\left(\omega \right)\right|{}^2\frac{ d\omega}{\omega }$$ and where $$<{\psi}_{r,s}\Big|f>={\displaystyle \underset{\mathrm{\mathbb{R}}}{\int }}\overline{\psi_{r,s}(y)}f(y) dy$$. The refinements and implications of this are spelled out in tables in section “Connections to Group Theory

### Some Background on Hilbert Space

Wavelet theory is the art of finding a special kind of basis in Hilbert space. Let ℋ be a Hilbert space over ℂ and denote the inner product 〈 ⋅ | ⋅ 〉. For us, it is assumed linear in the second variable. If ℋ = L 2(ℝ), then
$$\left\langle f\Big|g\right\rangle :={\displaystyle \underset{\mathrm{\mathbb{R}}}{\int }}\kern0.15em \overline{f(x)}\;g(x)\; dx.$$
If ℋ = 2(ℤ), then
$$\left\langle \xi \Big|\eta \right\rangle :={\displaystyle \sum_{n\in \mathrm{\mathbb{Z}}}}{\overline{\xi}}_n{\eta}_n.$$
Let $$\mathbb{T}=\mathrm{\mathbb{R}}/2\pi \mathrm{\mathbb{Z}}$$. If $$\mathrm{\mathcal{H}}={L}^2\left(\mathbb{T}\right)$$, then
$$\left\langle\;f\Big|g\right\rangle :=\frac{1}{2\pi }{\displaystyle \underset{-\pi }{\overset{\pi }{\int }}}\overline{f\left(\theta \right)}\;g\left(\theta \right)\; d\theta .$$
Functions $$f\in {L}^2\left(\mathbb{T}\right)$$ have Fourier series: Setting e n (θ) = e inθ ,
$$\widehat{f}(n):=\left\langle {e}_n\Big|f\right\rangle =\frac{1}{2\pi }{\displaystyle \underset{-\pi }{\overset{\pi }{\int }}}{e}^{- in\theta}f\left(\theta \right)\; d\theta,$$
and
$${f}_{L^2\left(\mathbb{T}\right)}^2={\displaystyle \sum_{n\in \mathrm{\mathbb{Z}}}}{\left|\kern0.15em ,\widehat{f}(n)\right|}^2.$$
Similarly if fL 2(ℝ), then
$$\widehat{f}(t):={\displaystyle \underset{\mathrm{\mathbb{R}}}{\int }}{e}^{- ixt}f(x)\; dx,$$
and
$${\left\Vert f\right\Vert}_{L^2\left(\mathrm{\mathbb{R}}\right)}^2=\frac{1}{2\pi }{\displaystyle \underset{\mathrm{\mathbb{R}}}{\int }}{\left|\kern0.15em ,\widehat{f}(t)\right|}^2\; dt.$$
Let J be an index set. We shall only need to consider the case when J is countable. Let {ψ α } αJ be a family of nonzero vectors in a Hilbert space ℋ. We say it is an orthonormal basis (ONB) if
$$\left\langle\;{\psi}_{\alpha}\Big|{\psi}_{\beta}\right\rangle ={\delta}_{\alpha, \beta}\kern2em \left(\mathrm{Kronecker}\kern0.36em \mathrm{delta}\right)$$
(3)
and if
$${\displaystyle \sum_{\alpha \in J}}{\left|\;\left\langle {\psi}_{\alpha}\Big|f\right\rangle\;\right|}^2={\left\Vert f\right\Vert}^2\kern2em \mathrm{holds}\kern0.24em \mathrm{for}\kern0.24em \mathrm{all}\kern0.24em f\in \mathrm{\mathcal{H}}.$$
(4)
If only (Eq. 4) is assumed, but not (Eq. 3), we say that {ψ α } αJ is a (normalized) tight frame. We say that it is a frame with frame constants 0 < AB < if
$$A{\left\Vert f\right\Vert}^2\le {\displaystyle \sum_{\alpha \in J}}{\left|\;\left\langle {\psi}_{\alpha}\Big|f\right\rangle\;\right|}^2\le B{\left\Vert f\right\Vert}^2\kern2em \mathrm{holds}\kern0.24em \mathrm{for}\kern0.24em \mathrm{all}\kern0.24em f\in \mathrm{\mathcal{H}}.$$
Introducing the rank-one operators Q α := |ψ α 〉〈ψ α | of Dirac’s terminology, see Bratelli and Jorgensen (2002), we see that {ψ α } αJ is an ONB if and only if the Q α ’s are projections, and
$$\kern0.5em {\displaystyle \sum_{\alpha \in J}}{Q}_{\alpha }=I\kern0.5em \left(=\mathrm{the}\kern0.5em \mathrm{identity}\kern0.5em \mathrm{operator}\kern0.5em \mathrm{in}\kern0.5em \mathrm{\mathcal{H}}\right).$$
(5)
It is a (normalized) tight frame if and if only if (Eq. 5) holds but with no further restriction on the rank-one operators Q α . It is a frame with frame constants A and B if the operator
$$\mathrm{S}:={\displaystyle \sum_{\alpha \in J}}{Q}_{\alpha }$$
satisfies
$$AI\le S\le BI$$
in the order of Hermitian operators. (We say that operators H i = H i * , i = 1, 2 satisfy H 1H 2 if 〈 f|H 1 f〉 ≤ 〈f|H 2 f〉 and holds for all f ∈ ℋ.) If h, k are vectors in a Hilbert space ℋ, then the operator A = |h〉〈k| is defined by the identity 〈 u|Av〉 = 〈u|h〉 〈k|v〉 for all u, v ∈ ℋ.
Wavelets in L 2(ℝ) are generated by simple operations on one or more functions ψ in L 2(ℝ); the operations come in pairs, say scaling and translation or phase modulation and translations. If N ∈ {2, 3, …}, we set
$${\psi}_{j,k}(x):={N}^{j/2}\psi \left({N}^jx-k\right)\kern2em \mathrm{for}\;j,k\in \mathrm{\mathbb{Z}}.$$

#### Increasing the Dimension

In wavelet theory (Daubechies 1992), there is a tradition for reserving φ for the father function and ψ for the mother function. A 1-level wavelet transform of an N × M image can be represented as
$$f\mapsto \left(\begin{array}{ccc}\hfill {a}^1\hfill & \hfill \Big|\hfill & \hfill {h}^1\hfill \\ {}\hfill --\hfill & \hfill \hfill & \hfill --\hfill \\ {}\hfill {v}^1\hfill & \hfill \Big|\hfill & \hfill {d}^1\hfill \end{array}\right)$$
(6)
where the subimages h1, d1, a1 and v1 each have the dimension of N/2 by M/2:
$$\begin{array}{l}{\mathrm{a}}^1={V}_m^1\otimes {V}_n^1:{\varphi}^A\left(x,y\right)=\varphi (x)\varphi (y)={\displaystyle \sum_i}{\displaystyle \sum_j}{h}_i{h}_j\varphi \left(2x-i\right)\varphi \left(2y-j\right)\hfill \\ {}{\mathrm{h}}^1={V}_m^1\otimes {W}_n^1:{\psi}^H\left(x,y\right)=\psi (x)\varphi (y)={\displaystyle \sum_i}{\displaystyle \sum_j}{g}_i{h}_j\varphi \left(2x-i\right)\varphi \left(2y-j\right)\hfill \\ {}{\mathrm{v}}^1={W}_m^1\otimes {V}_n^1:{\psi}^V\left(x,y\right)=\varphi (x)\psi (y)={\displaystyle \sum_i}{\displaystyle \sum_j}{h}_i{g}_j\varphi \left(2x-i\right)\varphi \left(2y-j\right)\hfill \\ {}{\mathrm{d}}^1={W}_m^1\otimes {W}_n^1:{\psi}^D\left(x,y\right)=\psi (x)\psi (y)={\displaystyle \sum_i}{\displaystyle \sum_j}{g}_i{g}_j\varphi \left(2x-i\right)\varphi \left(2y-j\right)\hfill \end{array}$$
(7)
where φ is the father function and ψ is the mother function in the sense of wavelet, V space denotes the average space, and the W spaces are the difference space from multiresolution analysis (MRA) (Daubechies 1992).
In the formulas, we have the following two indexed number systems a := (h i ) and d := (g i ): a is for averages and d is for local differences. They are really the input for the DWT. But they also are the key link between the two transforms: the discrete and continuous. The link is made up of the following scaling identities:
$$\begin{array}{l}\varphi (x)=2{\displaystyle \sum_{i\in \mathrm{\mathbb{Z}}}}{h}_i\varphi \left(2x-i\right);\\ {}\psi (x)=2{\displaystyle \sum_{i\in \mathrm{\mathbb{Z}}}}{g}_i\varphi \left(2x-i\right);\end{array}$$
and (low-pass normalization) $${\displaystyle \sum_{i\in \mathrm{\mathbb{Z}}}}{h}_i=1$$. The scalars (h i ) may be real or complex; they may be finite or infinite in number. If there are four of them, it is called the “four tap.” The finite case is best for computations since it corresponds to compactly supported functions. This means that the two functions φ and ψ will vanish outside some finite interval on a real line.
The two number systems are further subjected to orthogonality relations, of which
$${\displaystyle \sum_{i\in \mathrm{\mathbb{Z}}}}{\overline{h}}_i{h}_{i+2k}=\frac{1}{2}{\delta}_{0,k}$$
(8)
is the best known.

The systems h and g are both low-pass and high-pass filter coefficients. In equation (6), a1 denotes the first averaged image, which consists of average intensity values of the original image. Note that only φ function, V space, and h coefficients are used here. Similarly, h 1 denotes the first detail image of horizontal components, which consists of intensity difference along the vertical axis of the original image. Note that φ function is used on y, ψ function on x, W space for x values, and V space for y values; and both h and g coefficients are used accordingly. The data v 1 denotes the first detail image of vertical components, which consists of intensity difference along the horizontal axis of the original image. Note that φ function is used on x, ψ function on y, W space for y values, and V space for x values; and both h and g coefficients are used accordingly. Finally, d 1 denotes the first detail image of diagonal components, which consists of intensity difference along the diagonal axis of the original image. The original image is reconstructed from the decomposed image by taking the sum of the averaged image and the detail images and scaling by a scaling factor. It could be noted that only ψ function, W space, and g coefficients are used here. See Walker (1999), Song (2006b).

This decomposition not only limits to one step but it can be done again and again on the averaged detail depending on the size of the image. Once it stops at certain level, quantization (see Skodras et al. 2001; Usevitch 2001) is done on the image. This quantization step may be lossy or lossless. Then the lossless entropy encoding is done on the decomposed and quantized image.

The relevance of the system of identities (Eq. 8) may be summarized as follows. Set
$$\begin{array}{l}{m}_0(z):=\frac{1}{2}{\displaystyle \sum_{k\in \mathrm{\mathbb{Z}}}}{h}_k{z}^k\kern0.5em \mathrm{for}\kern0.5em \mathrm{all}\kern0.5em z\in \mathbb{T};\\ {}\kern0.36em {g}_k:={\left(-1\right)}^k{\overline{h}}_{1-k}\kern0.5em \mathrm{for}\kern0.5em \mathrm{all}\kern0.5em k\in \mathrm{\mathbb{Z}};\\ {}\kern1.2em {m}_1(z):=\frac{1}{2}{\displaystyle \sum_{k\in \mathrm{\mathbb{Z}}}}{g}_k{z}^k;\kern0.5em \mathrm{and}\\ {}\left({S}_jf\right)(z)=\sqrt{2}{m}_j(z)f\left({z}^2\right),\kern0.5em \mathrm{for}\kern0.5em j=0,1,\kern0.5em f\in {L}^2\left(\mathbb{T}\right),\kern0.5em z\in \mathbb{T}.\end{array}$$
Then the following conditions are equivalent:
1. (a)

The system of Eq. 8 is satisfied.

2. (b)

The operators S 0 and S 1 satisfy the Cuntz relations.

3. (c)

We have perfect reconstruction in the subband system of Fig. 3.

Note that the two operators S 0 and S 1 have equivalent matrix representations. Recall that by Parseval’s formula, we have $${L}^2\left(\mathbb{T}\right)\simeq {l}^2\left(\mathrm{\mathbb{Z}}\right)$$. So representing S 0 instead as an × matrix acting on column vectors x = (x j ) j∈ℤ, we get
$${\left({S}_0x\right)}_i=\sqrt{2}{\displaystyle \sum_{j\in \mathrm{\mathbb{Z}}}}{h}_{i-2j}{x}_j$$
and for the adjoint operator F 0 := S 0 * , we get the matrix representation
$${\left({F}_0x\right)}_i=\frac{1}{\sqrt{2}}{\displaystyle \sum_{j\in \mathrm{\mathbb{Z}}}}{\overline{h}}_{i-2j}{x}_j$$
with the overbar signifying complex conjugation. This is computational significance to the two matrix representations, the matrix both for S 0 and for F 0 := S 0 * is slanted. However, the slanting of one is the mirror image of the other, i.e., Open image in new window

#### Significance of Slanting

The slanted matrix representations refer to the corresponding operators in L 2. In general operators in Hilbert function, spaces have many matrix representations, one for each orthonormal basis (ONB), but here we are concerned with the ONB consisting of the Fourier frequencies z j , j ∈ ℤ. So in our matrix representations for the S operators and their adjoints, we will be acting on column vectors, each infinite column representing a vector in the sequence space l 2. A vector in l 2 is said to be of finite size if it has only a finite set of nonzero entries.

It is the matrix F 0 that is effective for iterated matrix computation. Reason: When a column vector x of a fixed size, say 2 s, is multiplied or acted on by F 0, the result is a vector y of half the size, i.e., of size s. So y = F 0 x. If we use F 0 and F 1 together on x, then we get two vectors, each of size s, the other one z = F 1 x, and we can form the combined column vector of y and z; stacking y on top of z. In our application, y represents averages, while z represents local differences, hence the wavelet algorithm:
$$\begin{array}{c}\left[\begin{array}{c}\hfill \vdots \hfill \\ {}\hfill {y}_{-1}\hfill \\ {}\hfill {y}_0\hfill \\ {}\hfill {y}_1\hfill \\ {}\hfill \vdots \hfill \\ {}\hfill --\hfill \\ {}\hfill \vdots \hfill \\ {}\hfill {z}_{-1}\hfill \\ {}\hfill {z}_0\hfill \\ {}\hfill {z}_1\hfill \\ {}\hfill \vdots \hfill \end{array}\right]=\left[\begin{array}{c}\hfill {F}_0\hfill \\ {}\hfill --\hfill \\ {}\hfill {F}_1\hfill \end{array}\right]\left[\begin{array}{c}\hfill \vdots \hfill \\ {}\hfill {x}_{-2}\hfill \\ {}\hfill {x}_{-1}\hfill \\ {}\hfill {x}_0\hfill \\ {}\hfill {x}_1\hfill \\ {}\hfill {x}_2\hfill \\ {}\hfill \vdots \hfill \end{array}\right]\\ {}y={F}_0x\\ {}z={F}_1x\end{array}$$

### Connections to Group Theory

The first line in the two tables below is the continuous wavelet transform. It comes from what in physics is called coherent vector decompositions. Both transforms apply to vectors in Hilbert space ℋ, and ℋ may vary from case to case. Common to all transforms is vector input and output. If the input agrees with the output, we say that the combined process yields the identity operator image. 1 : ℋ → ℋ or written 1. So, for example, if (S i ) i=0 N−1 is a finite operator system, the input/output operator example may take the form
$${\displaystyle \sum_{i=0}^{N-1}}{S}_i{S}_i^{*}={1}_{\mathrm{\mathcal{H}}}.$$
Summary of and variations on the resolution of the identity operator 1 in L 2 or in 2, for ψ and $$\tilde{\psi}$$ where $${\psi}_{r,s}(x)={r}^{-\frac{1}{2}}\psi \left(\frac{x-s}{r}\right)$$,
$${C}_{\psi }={\displaystyle \underset{\mathrm{\mathbb{R}}}{\int }}\frac{ d\omega}{\left|\omega \right|}{\left|\widehat{\psi}\left(\omega \right)\right|}^2<\infty,$$
similarly for $$\tilde{\psi}$$ and $${C}_{\psi, \tilde{\psi}}={\displaystyle \underset{\mathrm{\mathbb{R}}}{\int }}\frac{ d\omega}{\left|\omega \right|}\overline{\tilde{\psi}\left(\omega \right)}\widehat{\overline{\psi}}\;\left(\omega \right)$$:
 N = 2 Over-complete basis Dual basis Continuous resolution $$C{0}_{\psi}^{-1}{\displaystyle \underset{{\mathrm{\mathbb{R}}}^2}{\iint }}\frac{ dr\; ds}{r^2}\left|{\psi}_{r,s}\right.\left.\right\rangle \left\langle \right.\left.{\psi}_{r,s}\right|$$ = 1 $${C}_{\psi, \widehat{\psi}}^{-1}{\displaystyle \underset{{\mathrm{\mathbb{R}}}^2}{\iint }}\frac{ dr\; ds}{r^2}\left|{\psi}_{r,s}\right.\left.\right\rangle \left\langle \right.\left.{\tilde{\psi}}_{r,s}\right|$$ = 1 Discrete resolution $${\displaystyle \sum_{j\in \mathrm{\mathbb{Z}}}}{\displaystyle \sum_{k\in \mathrm{\mathbb{Z}}}}\left|{\psi}_{j,k}\right.\left.\right\rangle \left\langle \right.\left.{\psi}_{j,k}\right|=1,\kern1em {\psi}_{j,k}$$ corresponding to r = 2−j , s = k2−j $${\displaystyle \sum_{j\in \mathrm{\mathbb{Z}}}}{\displaystyle \sum_{k\in \mathrm{\mathbb{Z}}}}\left|{\psi}_{j,k}\right.\left.\right\rangle \left\langle \right.\left.{\tilde{\psi}}_{j,k}\right|=1$$ N ≥ 2 Isometries in ℓ 2 Dual operator system in ℓ 2 Sequence spaces $${\displaystyle \sum_{i=0}^{N-1}}{S}_i{S}_i^{*}=1$$, Where S 0, …, S N−1 are adjoints to the quadrature mirror filter operators F i , i.e., S i * = F i * $${\displaystyle \sum_{i=0}^{N-1}}{S}_i{\tilde{S}}_i^{*}=1$$, for a dual operator system $$\begin{array}{l}{S}_0,\dots, {S}_{N-1},\\ {}{\tilde{S}}_0,\dots, {\tilde{S}}_{N-1}\end{array}$$
Then the assertions in the first table amount to
 $$\begin{array}{l}{C}_{\psi}^{-1}{\displaystyle \underset{R^2}{\iint }} dr\; ds{r}^2{\left|\;{\psi}_{r,s}\Big|f\;\right|}^2\\ {}\kern4.32em ={f}_{L^2}^2\kern1em \forall f\in {L}^2\left(\mathrm{\mathbb{R}}\right)\end{array}$$ $$\begin{array}{l}{C}_{\psi, {\tilde {\psi}}}^{-1}{\displaystyle {\displaystyle \int {\displaystyle {\int}_{{\mathrm{\mathbb{R}}}^2}\frac{ dr\; ds}{r^2}}}}\;\left\langle f,\Big|,{\psi}_{r,s}\;\right\rangle\;{\left\langle\;{\tilde {\psi}}\right.}_{r,s}\left.\Big|\right\rangle \left.g\right\rangle\;\\ {}\kern5.28em =f\Big|g\kern1.12em \forall f,g\in {L}^2\left(\mathrm{\mathbb{R}}\right)\end{array}$$ $$\begin{array}{l}{\displaystyle \sum_{j\in Z}}{\displaystyle \sum_{k\in Z}}{\left|\;{\psi}_{j,k}\Big|f\;\right|}^2\\ {}\kern4.08em ={f}_{L^2}^2\kern1em \forall f\in {L}^2\left(\mathrm{\mathbb{R}}\right)\end{array}$$ $$\begin{array}{l}{\displaystyle \sum_{j\in Z}}{\displaystyle \sum_{k\in Z}}\;\left\langle f\right.\Big|\psi \left.{}_{j,k}\right\rangle\;{\left\langle\;\tilde{\psi}\right.}_{j,k}\left.\Big|\right\rangle \left.g\right\rangle\;\\ {}\kern4.32em =f\Big|g\kern1.12em \forall f,g\in {L}^2\left(\mathrm{\mathbb{R}}\right)\end{array}$$ $${\displaystyle \sum_{i=0}^{N-1}}{S}_i^{*}{c}^2={c}^2\kern1em \forall c\in {\ell}^2$$ $${\displaystyle \sum_{i=0}^{N-1}}\;{S}_i^{*}c\left|{\tilde{S}}_i^{*}d=c\right|d\kern1.12em \forall c,d\in {\ell}^2$$
A function ψ satisfying the resolution identity is called a coherent vector in mathematical physics. The representation theory for the (ax + b) group, i.e., the matrix group $$G=\left\{\left(\begin{array}{c}\hfill a\hfill \\ {}\hfill 0\hfill \end{array}\begin{array}{c}\hfill b\hfill \\ {}\hfill 1\hfill \end{array}\right)\Big|a\in \mathrm{\mathbb{R}}+,b\in \mathrm{\mathbb{R}}\right\}$$, serves as its underpinning. Then the tables above illustrate how the {ψ j,k } wavelet system arises from a discretization of the following unitary representation of G:
$$\left({U}_{\left(\begin{array}{c}\hfill a\hfill \\ {}\hfill 0\hfill \end{array}\begin{array}{c}\hfill b\hfill \\ {}\hfill 1\hfill \end{array}\right)}f\right)(x)={a}^{-\frac{1}{2}}f\left(\frac{x-b}{a}\right)$$
acting on L2(ℝ). This unitary representation also explains the discretization step in passing from the first line to the second in the tables above. The functions {ψ j,k |j, k ∈ ℤ} which make up a wavelet system result from the choice of a suitable coherent vector ψ ∈ L2(ℝ) and then setting
$${\psi}_{j,k}(x)=\left({U}_{\left(\begin{array}{c}\hfill {2}^{-j}\hfill \\ {}\hfill 0\hfill \end{array}\begin{array}{c}\hfill k{.2}^{-j}\hfill \\ {}\hfill 1\hfill \end{array}\right)}\right)(x)={2}^{\frac{j}{2}\psi}\left({2}^jx-k\right).$$

Even though this representation lies at the historical origin of the subject of wavelets, the (ax + b) group seems to be now largely forgotten in the next generation of the wavelet community. But Chaps. 1–3 of Daubechies (1992) still serve as a beautiful presentation of this (now much ignored) side of the subject. It also serves as a link to mathematical physics and to classical analysis.

## Tools from Mathematics

In our presentation, we will rely on tools from at least three separate areas of mathematics, and we will outline how they interact to form a coherent theory and how they come together to form a link between what is now called the discrete and the continuous wavelet transform. It is the discrete case that is popular with engineers (Aubert and Kornprobst 2006; Liu 2006; Strang 1997, 2000), while the continuous case has come to play a central role in the part of mathematics referred to as harmonic analysis (Daubechies 1993). The three areas are operator algebras, dynamical systems, and basis constructions:
1. (a)

Operator algebras. The theory of operator algebras in turn breaks up in two parts: One is the study of “the algebras themselves” as they emerge from the axioms of von Neumann (von Neumann algebras) and Gelfand, Kadison, and Segal (C*-algebras.) The other has a more applied slant: It involves “the representations” of the algebras. By this, we refer to the following: The algebras will typically be specified by generators and by relations and by a certain norm completion, in any case by a system of axioms. This holds both for the norm-closed algebras, the so-called C*-algebras, and for the weakly closed algebras, the von Neumann algebras. In fact there is a close connection between the two parts of the theory: For example, representations of C*-algebras generate von Neumann algebras.

To talk about representations of a fixed algebra, say A, we must specify a Hilbert space and a homomorphism ρ from i into the algebra ℬ(H) of all bounded operators on ℋ. We require that ρ sends the identity element in A into the identity operator acting on ℋ and that ρ(a*) = (ρ(a))* where the last star now refers to the adjoint operator.

It was realized in the last 10 years (see, e.g., Bratelli and Jorgensen 2002; Jorgensen 2006a, b) that a family of representations that wavelets which are basis constructions in harmonic analysis, in signal/image analysis, and in computational mathematics may be built up from representations of an especially important family of simple C*-algebras, the Cuntz algebras. The Cuntz algebras are denoted $${\mathcal{O}}_2,{\mathcal{O}}_3,\dots,$$ including $${\mathcal{O}}_{\infty }$$.
1. (b)

Dynamical systems. The connection between the Cuntz algebras $${\mathcal{O}}_N$$ for N = 2, 3, is relevant to the kind of dynamical systems which are built on branching laws, the case of $${\mathcal{O}}_N$$ representing N -fold branching. The reason for this is that if N is fixed, $${\mathcal{O}}_N$$ includes in its definition an iterated subdivision, but within the context of Hilbert space. For more details, see, e.g., Dutkay (2004), Dutkay and Roysland (2007a), Dutkay and Jorgensen (2005, 2006a, b, c), Jorgensen (2006b).

2. (c)

Analysis of bases in function spaces. The connection to basis constructions using wavelets is this: The context for wavelets is a Hilbert space ℋ, where ℋ may be L 2(ℝ d ) where d is a dimension, d = 1 for the line (signals), d = 2 for the plane (images), etc. The more successful bases in Hilbert space are the orthonormal bases ONBs, but until the mid 1980s, there were no ONBs in L 2(ℝ d ) which were entirely algorithmic and effective for computations. One reason for this is that the tools that had been used for 200 years since Fourier involved basis functions (Fourier wave functions) which were not localized. Moreover, these existing Fourier tools were not friendly to algorithmic computations.

## A Transfer Operator

A popular tool for deciding if a candidate for a wavelet basis is in fact an ONB uses a certain transfer operator. Variants of this operator are used in diverse areas of applied mathematics. It is an operator which involves a weighted average over a finite set of possibilities. Hence, it is natural for understanding random walk algorithms. As remarked in, for example, Jorgensen (2003, 2006a, b), Dutkay (2004), it was also studied in physics, for example, by David Ruelle who used to prove results on phase transition for infinite spin systems in quantum statistical mechanics. In fact the transfer operator has many incarnations (many of them known as Ruelle operators), and all of them based on N-fold branching laws.

In our wavelet application, the Ruelle operator weights in input over the N branch possibilities, and the weighting is assigned by a chosen scalar function W and the W-Ruelle operator is denoted R W . In the wavelet setting there is in addition a low-pass filter function m 0 which in its frequency response formulation is a function on the d-torus T d = ℝ d /ℤ d .

Since the scaling matrix A has integer entries, A passes to the quotient ℝ d /ℤ d , and the induced transformation $${r}_A:{\mathbb{T}}^d\to {\mathbb{T}}^d$$ is an N-fold cover, where N = |detA|, i.e., for every x in $${\mathbb{T}}^d$$, there are N distinct points y in $${\mathbb{T}}^d$$ solving r A (y) = x.

In the wavelet case, the weight function W is W = |m 0|2. Then with this choice of W, the ONB problem for a candidate for a wavelet basis in the Hilbert space L 2(ℝ d ) as it turns out may be decided by the dimension of a distinguished eigenspace for R W , by the so-called Perron–Frobenius problem.

This has worked well for years for the wavelets which have an especially simple algorithm, the wavelets that are initialized by a single function, called the scaling function. These are called the multiresolution analysis (MRA) wavelets, or for short the MRA wavelets. But there are instances, for example, if a problem must be localized in frequency domain, when the MRA wavelets do not suffice, where it will by necessity include more than one scaling function. And we are then back to trying to decide if the output from the discrete algorithm and the $${\mathcal{O}}_N$$ representation is an ONB or if it has some stability property which will serve the same purpose, in case where asking for an ONB is not feasible.

## Future Directions

The idea of a scientific analysis by subdividing a fixed picture or object into its finer parts is not unique to wavelets. It works best for structures with an inherent self-similarity; this self-similarity can arise from numerical scaling of distances. But there are more subtle nonlinear self-similarities. The Julia sets in the complex plane are a case in point (Braverman and Yampolsky 2006; Braverman 2006; Devaney and Look 2006; Devaney et al. 2007; Milnor 2004; Petersen and Zakeri 2004). The simplest Julia set come from a one parameter family of quadratic polynomials φ c (z) = z 2 + c, where z is a complex variable and where c is a fixed parameter. The corresponding Julia sets J c have a surprisingly rich structure. A simple way to understand them is the following: Consider the two branches of the inverse $${\beta}_{\pm }=z\mapsto \pm \sqrt{z-c}$$. Then J c is the unique minimal nonempty compact subset of ℂ, which is invariant under {β ±}. (There are alternative ways of presenting J c but this one fits our purpose. The Julia set J of a holomorphic function, in this case zz 2 + c, informally consists of those points whose long-time behavior under repeated iteration, or rather iteration of substitutions, can change drastically under arbitrarily small perturbations.) Here “long time” refers to large n, where φ (n+1)(z) = φ(φ (n)(z)), n = 0, 1, … , and φ (0)(z) = z (Figs. 6 and 7).

It would be interesting to adapt and modify the Haar wavelet and the other wavelet algorithms to the Julia sets. The two papers (Dutkay and Jorgensen 2005, 2006b) initiated such a development. Then an attempt to adapt and modify the Haar wavelet to the Julia sets was made (Dutkay et al. 2012); however, there were some limitations in finding the filters. Perhaps trying another fractal set such as tent map or others may work.

### Orthonormal Bases Generated by Cuntz Algebras

We present new results from (Dutkay et al. 2012) by borrowing section “Introduction” and part of section “Definition” from (Dutkay et al. 2012) in the rest of section “Orthonormal Bases Generated by Cuntz Algebras.” It gives a general criterion for a family generated by the Cuntz isometries to be an orthonormal basis.

### Theorem 1

Dutkay et al. (2012) Letbe a Hilbert space and (S i ) i=0 N−1 be a representation of the Cuntz algebra $${\mathcal{O}}_N$$ . Letbe an orthonormal set inand f : X → ℋ a norm continuous function on a topological space X with the following properties:
1. (i)

$$\mathrm{\mathcal{E}}={{\displaystyle \cup}}_{i=0}^{N-1}{S}_i\mathrm{\mathcal{E}}.$$

2. (ii)

$$\overline{\mathrm{span}}\left\{f(t):t\in X\right\}=\mathrm{\mathcal{H}}$$ and ||f(t)|| = 1, for all tX.

3. (iii)
There exist functions $${\mathfrak{m}}_i:X\to \mathrm{\mathbb{C}}$$, g i : XX, i = 0, … , N − 1 such that
$${S}_i^{*}f(t)={\mathfrak{m}}_i(t)f\left({g}_i(t)\right),\kern1em t\in X.$$
(9)

4. (iv)

There exists c 0X such that $$f\left({c}_0\right)\in \overline{\mathrm{span}}\mathrm{\mathcal{E}}.$$

5. (v)

The only function $$h\in \mathcal{C}(X)$$ with h ≥ 0, h(c) = 1, $$\forall c\in \left\{x\in X:f(x)\in \overline{\mathrm{span}}\mathrm{\mathcal{E}}\right\}$$, and

$$h(t)={\displaystyle \sum_{i=0}^{N-1}}\left|{\mathfrak{m}}_i(t)\right|{}^2h\left({g}_i(t)\right),\kern1em t\in X$$
(10)

is the constant functions.

Thenis an orthonormal basis for ℋ.

### Proof

Define
$$h(t):={\displaystyle \sum_{e\in \mathrm{\mathcal{E}}}}\left|f(t),e\right|{}^2=\left|\right| Pf(t)\left|\right|{}^2,\kern1em t\in X$$
where P is the orthogonal projection onto the closed linear span of ℰ.
Since tf(t) is norm continuous, we get that h is continuous. Clearly h ≥ 0. Also, if $$f(c)\in \overline{\mathrm{span}}\mathrm{\mathcal{E}}$$, then ||Pf(c)|| = ||f(c)|| = 1 so h(c) = 1. In particular, from (ii) and (iv), h(c 0) = 1. We check (Eq. 10). Since the sets S i ℰ, i = 0, … N − 1 are mutually orthogonal, the union in (i) is disjoint. Therefore, for all tX,
$$\begin{array}{l}h(t)={\displaystyle \sum_{i=0}^{N-1}}{\displaystyle \sum_{e\in \mathrm{\mathcal{E}}}}\left|\left\langle f\right.(t),{S}_i\left.e\right\rangle \right|{}^2={\displaystyle \sum_{i=0}^{N-1}}{\displaystyle \sum_{e\in \mathrm{\mathcal{E}}}}\left|{\left\langle S\right.}_i^{*}f\right(t\left),\left.e\right\rangle \right|{}^2={\displaystyle \sum_{i=0}^{N-1}}\left|{\mathfrak{m}}_i\right(t\left)\right|{}^2{\displaystyle \sum_{e\in \mathrm{\mathcal{E}}}}\left|\left\langle f\right.\right({g}_i(t)\left),\left.e\right\rangle \right|{}^2\\ {}\kern7.8em ={\displaystyle \sum_{i=0}^{N-1}}\left|{\mathfrak{m}}_i(t)\right|{}^2h\left({g}_i(t)\right)\end{array}$$

By (v), h is constant and, since h(c 0) = 1, h(t) = 1 for all tX. Then ||Pf(t)|| = 1 for all tX. Since ||f(t)|| = 1, it follows that f(t) ∈ spanℰ for all tX. But the vectors f(t) span ℋ so $$\overline{\mathrm{span}}\mathrm{\mathcal{E}}=\mathrm{\mathcal{H}}$$ and ℰ is an orthonormal basis.

### Remark 2

Dutkay et al. (2012) The operators of the form
$$Rh(t)={\displaystyle \sum_{i=0}^{N-1}}\left|{\mathfrak{m}}_i(t)\right|{}^2h\left({g}_i(t)\right),\kern1em t\in X,h\in C(X),$$
which appear in (Eq. 10), are sometimes called Ruelle operators or transfer operators; see, e.g., (Baladi 2000).

### Example 3

Dutkay et al. (2012) We consider affine iterated function systems with no overlap. Let R be a d × d expansive real matrix, i.e., all the eigenvalues of R have absolute value strictly greater than 1. Let B ⊂ ℝ d a finite set such that N = |B|. Define the affine iterated function system:
$${\tau}_b(x)={R}^{-1}\left(x+b\right)\kern1em \left(x\in {\mathrm{\mathbb{R}}}^d,\kern0.5em b\in B\right)$$
(11)
By (Hutchinson 1981), there exists a unique compact subset X B of ℝ d which satisfies the invariance equation
$${X}_B={{\displaystyle \cup}}_{b\in B}{\tau}_b\left({X}_B\right)$$
(12)
X B is called the attractor of the iterated function system (τ b ) bB . Moreover, X B is given by
$${X}_B=\left\{{\displaystyle \sum_{k=1}^{\infty }}{R}^{-k}{b}_k\kern0.5em :\kern0.5em {b}_k\in B\kern0.5em \mathrm{for}\kern0.5em \mathrm{all}\kern0.5em k\ge 1\right\}$$
(13)
Also from (Hutchinson 1981), there is a unique probability measure μ B on ℝ d satisfying the invariance equation
$${\displaystyle \int } fd{\mu}_B=\frac{1}{N}{\displaystyle \sum_{b\in B}}{\displaystyle \int }f\circ {\tau}_bd{\mu}_B$$
(14)
for all continuous compactly supported functions f on ℝ. We call μ B the invariant measure for the iterated function system (IFS) (τ b ) bB . By (Hutchinson 1981), μ B is supported on the attractor X B . We say that the IFS has no overlap if μ B (τ b (X B ) ∩ τ b (X B )) = ∅ for all bb′ in B.
Assume that the IFS (τ b ) bB has no overlap. Define the map r : X B X B :
$$r(x)={\tau}_b^{-1}(x),\kern0.5em \mathrm{if}\kern0.5em x\in {\tau}_b\left({X}_B\right)$$
(15)

Then r is an N to 1 onto map and μ B is strongly invariant for r. Note that r − 1(x) = {τ b (x) : bB} for μ B a.e. xX B .

We apply Theorem 1 to the setting of Example 3, in dimension d = 1 for affine iterated function systems, when the set $$\frac{1}{R}B$$ has a spectrum L (Dutkay et al. 2012).

### Definition 4

Dutkay et al. (2012) Let L in ℝ, |L| = N, R > 1 such that L is a spectrum for the set $$\frac{1}{R}B$$. We say that c ∈ ℝ is an extreme cycle point for (B, L) if there exists l 0, l 1, … , l p−1 in L such that, if c 0 = c, $${c}_1=\frac{c_0+{l}_0}{R},{c}_2=\frac{c_1+{l}_1}{R}\dots {c}_{p-1}=\frac{c_{p-2}+{l}_{p-2}}{R}$$ then $$\frac{c_{p-1}+{l}_{p-1}}{R}={c}_0$$, and |m B (c i )| = 1 for i = 0, … , p − 1 where
$${m}_B(x)=\frac{1}{N}{\displaystyle \sum_{b\in B}}{e}^{2\pi ibx}\kern1em x\in \mathrm{\mathbb{R}}.$$

### Proposition 5

Dutkay et al. (2012) Let (m i ) i=0 N−1 be a QMF basis. Define the operators on L 2(X, μ):
$${S}_i(f)={m}_if\circ r,\kern1em i=0,\dots, N-1$$
(16)
Then the operators S i are isometries and they form a representation of the Cuntz algebra $${\mathcal{O}}_N$$ , i.e.,
$${S}_i^{*}{S}_j={\delta}_{ij},\kern1em i,j=0,\dots, N-1,\kern2em {\displaystyle \sum_{i=0}^{N-1}}{S}_i{S}_i^{*}=I$$
(17)
The adjoint of S i is given by the formula
$${S}_i^{*}(f)(z)=\frac{1}{N}{\displaystyle \sum_{r(w)=z}}{\overline{m}}_i(w)f(w)$$
(18)

### Proof

We compute the adjoint: Take f, g in L 2(X, μ). We use the strong invariance of μ:
$$\left\langle {S}_i^{*}f,g\right\rangle ={\displaystyle \int }f{\overline{m}}_i\overline{ g \mathit{^{\circ}r}}\; d\mu ={\displaystyle \int}\frac{1}{N}{\displaystyle \sum_{r(w)=z}}{\overline{m}}_i(w)f(w)\overline{g}(z) d\mu (z)$$

Then (Eq. 18) follows. The Cuntz relations in (Eq. 17) are then easily checked with Proposition ??

### Definition 6

Dutkay et al. (2012) We denote by L* the set of all finite words with digits in L, including the empty word. For lL let S l be given as in (Eq. 16) where m l is replaced by the exponential e l . If w = l 1 l 2l n L* then by S w we denote the composition $${S}_{l_1}{S}_{l_2}\dots {S}_{l_n}$$.

### Theorem 7

Dutkay et al. (2012) Let B ⊂ ℝ, 0 ∈ B, |B| = N, R > 1 and let μ B be the invariant measure associated to the IFS τ b (x) = R −1(x + b), bB. Assume that the IFS has no overlap and that the set $$\frac{1}{R}B$$ has a spectrum L ⊂ ℝ, 0 ∈ L. Then the set
$$\mathrm{\mathcal{E}}(L)=\left\{{S}_w{e}_{-c}:c\kern0.5em is\kern0.5em an\kern0.5em extreme\kern0.5em cycle\kern0.5em point\kern0.5em for\kern0.5em \left(B,L\right),w\in {L}^{*}\right\}$$

is an orthonormal basis in L 2(μ B ). Some of the vectors in ℰ(L) are repeated but we count them only once.

### Proof

Let c be an extreme cycle point. Then |m B (c)| = 1. Using the fact that we have equality in the triangle inequality $$\left(1=\left|{m}_B(c)\right|\le \frac{1}{N}{\displaystyle \sum_{b\in B}}\left|{e}^{2\pi ibc}\right|=1\right)$$, and since 0 ∈ B, we get that e 2πibc = 1 so bc ∈ ℤ for all bB. Also there exists another extreme cycle point d and lL such that $$\frac{d+l}{R}=c$$. Then we have S l e c (x) = e 2πilx e 2πi(Rxb)(−c), if xτ b (X B ). Since bc ∈ ℤ and R(−c) + l = − d, we obtain
$${S}_l{e}_{-c}={e}_{-d}$$
(19)
We use this property to show that the vectors S w e c , $${S}_{w^{\hbox{'}}}{e}_{-{c}^{\hbox{'}}}$$ are either equal or orthogonal for w, w′ in L* and c, c′ extreme cycle points for (B, L). Using (Eq. 19), we can append some letters at the end of w and w′ such that the new words have the same length:
$${S}_w{e}_{-c}={S}_{w\alpha}{e}_{-d},\kern1em {S}_{w^{\hbox{'}}}{e}_{-{c}^{\hbox{'}}}={S}_{w^{\hbox{'}}\beta }{e}_{-{d}^{\hbox{'}}},\kern1em \left| w\alpha \right|=\left|{w}^{\hbox{'}}\beta \right|\kern1em \mathrm{where}\kern0.5em d,{d}^{\hbox{'}}\kern0.5em \mathrm{are}\kern0.5em \mathrm{cycle}\kern0.5em \mathrm{points}.$$

Moreover, repeating the letters for the cycle points d and d′ as many times as we want, we can assume that α ends in a repetition of the letters associated to d and similarly for β and d′. But since || = |wβ|, the Cuntz relations imply that $${S}_{w\alpha}{e}_{-d}\perp {S}_{w^{\hbox{'}}\beta }{e}_{-{d}^{\hbox{'}}}$$ or = wβ. Assume |w| ≤ |w′|. Then α = wβ for some word w″. Then $${S}_{w\alpha}{e}_{-d}\perp {S}_{w^{\hbox{'}}\beta }{e}_{-d}$$ iff S α e (−d)S wβ e d . Also, α consists of repetitions of the digits of the cycle associated to d and similarly for d′. So $${S}_{\alpha }{e}_{-d}={e}_{-f},{S}_{w^{\hbox{'}\hbox{'}}\beta }{e}_{-{d}^{\hbox{'}}}={e}_{-{f}^{\hbox{'}}}$$, and all points d, d′, f, f′, c, c′ all belong to the same cycle. So the only case when S w e c is not orthogonal to $${S}_{w^{\hbox{'}}}{e}_{-{c}^{\hbox{'}}}$$ is when they are equal.

Next we check that the hypotheses of Theorem 1 are satisfied. We let f(t) = e t L 2(μ B ). To check (i) we just to have to see that e c ∈ ∪ lL S l ℰ(L). But this follows from (1). Requirement (ii) is clear. For (iii), we compute
$$\begin{array}{l}{S}_l^{*}{e}_{-t}(x)=\frac{1}{N}{\displaystyle \sum_{b\in B}}{e}^{-2\pi il\cdot \frac{1}{R}\left(x+b\right)}{e}^{-2\pi it\cdot \frac{1}{R}\left(x+b\right)}={e}^{-2\pi x\cdot \frac{1}{R}\left(t+l\right)}\frac{1}{N}{\displaystyle \sum_{b\in B}}{e}^{-2\pi ib\left(\frac{t+l}{R}\right)}=\\ {}\kern4.44em =\overline{m_B}\left(\frac{t+l}{R}\right){e}_{-\frac{t+l}{R}}(x)\end{array}$$

So (iii) is satisfied with $${\mathfrak{m}}_l(t)=\overline{m_B}\left(\frac{t+l}{R}\right)$$, $${g}_l(t)=\frac{t+l}{R}$$.

For (iv), take c 0 = −c for any extreme cycle point (0 is always one). For (v), take h continuous on ℝ, 0 ≤ h ≤ 1, h(c) = 1, for all c with $${e}_{-c}\in \overline{\mathrm{span}}\mathrm{\mathcal{E}}(L)$$, and
$$h(t)={\displaystyle \sum_{l\in L}}{\left|{m}_B\left(\frac{t+l}{R}\right)\right|}^2h\left(\frac{t+L}{R}\right):= Rh(t)$$

In particular, we have h(c) = 1 for every extreme cycle point c. Assume $$h\kern.5em \not\equiv 1$$. First, we will restrict our attention to tI := [a, b] with $$a\le \frac{ \min L}{R-1}$$, $$b\ge \frac{ \max L}{R-1}$$, and note that g l (I) ⊂ I for all lL. Let m = min tI h(t). Then let h′ = hm assume m < 1. Then Rh′(t) = h′(t) for all t ∈ ℝ, h′ has a zero in I and h ≥ 0 on I, h′(z 0) = 0. But this implies that |m B (g l (z 0))|2 h′(g l (z 0)) = 0 for all lL. Since ∑ lL |m B (g l (z 0))|2 = 1, it follows that for one of the l 0L, we have $${h}^{\hbox{'}}\left({g}_{l_0}\left({z}_0\right)\right)=0$$. By induction, we can find $${z}_n={g}_{l_{n-1}}\cdots {g}_{l_0}{z}_0$$ such that h′(z n ) = 0. We prove that z 0 is a cycle point. Suppose not. Since m B has finitely many zeros, for n large enough $${g}_{\alpha_k}\cdots {g}_{\alpha_1}{z}_n$$ is not a zero for m B , for any choice of digits α 1, …, α k in L. But then, by using the same argument as above, we get that $${h}^{\hbox{'}}\left({g}_{\alpha_k}\cdots \kern.3em {g}_{\alpha_1}{z}_n\right)=0$$ for any α 1, …, α k L. The points $$\left\{{g}_{\alpha_k}\cdots \kern.3em {g}_{\alpha_1}{z}_n:{\alpha}_1,\dots {\alpha}_k\in L,k\in \mathrm{\mathbb{N}}\right\}$$ are dense in the attractor X L of the IFS {g l } lL ; thus, h′ is constant 0 on X L . But the extreme cycle points c are in X L , and since h(c) = 1, we have 0 = h′(c) = 1 −m, so m = 1. Thus, h = 1 on I. Since we can let a → − and b, we obtain that h ≡ 1.

### Remark 8

Dutkay et al. (2012) The functions in ℰ(L) are piecewise exponential. The formula for $${S}_{l_1\dots {l}_n}{e}_{-c}\kern0.1em$$ is
$${S}_{l_1\dots {l}_n}{e}_{-c}(x)={e}^{\alpha \left(b,l,c\right)}\cdot {e}_{l_1+R{l}_2+\dots +{R}^{n-1}{l}_{n-1}+{R}^n\left(-c\right)}(x)$$
(20)
where α(b, l, c) = −[b 1 l 2 + (Rb 1 + b 2)l 3 + … + (R n−2 b 1 + … + b n−1)l n ] + (R n−1 b 1 + … + b n )⋅ c if $$x\in {\tau}_{b_1}\dots {\tau}_{b_n}{X}_B$$. We have
$${S}_{l_1}\dots {S}_{l_n}{e}_{-c}(x)={e}_{l_1}(x){e}_{l_2}(rx)\dots {e}_{l_n}\left({r}^{n-1}x\right){e}_c\left({r}^nx\right)$$
If $$x\in {\tau}_{b_1}\dots {\tau}_{b_n}{X}_B$$, then $$rx\in {\tau}_{b_2}\dots {\tau}_{b_n}{X}_B$$, $${r}^{n-1}x\in {\tau}_{b_n}{X}_B$$. So
$$\begin{array}{l}\kern0.96em rx= Rx-{b}_1\\ {}\kern0.48em {r}^2x= Rrx-{b}_2={R}^2x-R{b}_1-{b}_2\\ {}\kern0.96em \vdots \\ {}{r}^{n-1}x={R}^{n-1}x-{R}^{n-2}{b}_1-\dots -R{b}_{n-2}-{b}_{n-1}\\ {}\kern0.48em {r}^nx={R}^nx-{R}^{n-1}{b}_1-{R}^{n-2}{b}_2-\dots -R{b}_{n-1}-{b}_n.\end{array}$$

The rest follows from a direct computation.

### Corollary 9

Dutkay et al. (2012) In the hypothesis of Theorem 1, if in addition B, L ⊂ ℤ and R ∈ ℤ, then there exists a set Λ such that {e λ : λΛ} is an orthonormal basis for L 2(μ B ).

### Proof

If everything is an integer then, it follows from Remark 8 that S w e c is an exponential function for all w and extreme cycle points c. Note that, as in the proof of Theorem 1, bc ∈ ℤ for all bB.

### Example 10

Dutkay et al. (2012) We consider the IFS that generates the middle third Cantor set: R = 3, B = {0, 2}. The set $$\frac{1}{3}\left\{0,2\right\}$$ has spectrum L = {0, 3/4}. We look for the extreme cycle points for (B, L).

We need |m B (−c)| = 1 so $$\left|\frac{1+{e}^{2\pi i2c}}{2}\right|=1$$; therefore, $$c\in \frac{1}{2}\mathrm{\mathbb{Z}}$$. Also c has to be a cycle for the IFS g 0(x) = x/3, $${g}_{3/4}(x)=\frac{x+3/4}{3}$$ so $$0\le c\le \frac{3/4}{3-1}=3/8$$. Thus, the only extreme cycle is {0}. By Theorem 1, ℰ = {S w 1 : w ∈ {0, 3/4}*} is an orthonormal basis for L 2(μ B ). Note also that the numbers e 2πiα(b,l,c) in formula (Eq. 1) are ± 1 because 2πiBLπiℤ.

#### Walsh Bases

In the following, we will focus on the unit interval, which can be regarded as the attractor of a simple IFS and we use step functions for the QMF basis to generate Walsh-type bases for L 2[0, 1] (Dutkay et al. 2012).

### Example 11

Dutkay et al. (2012) The interval [0, 1] is the attractor of the IFS $${\tau}_0x=\frac{x}{2},{\tau}_1x=\frac{x+1}{2}$$, and the invariant measure is the Lebesgue measure on [0, 1]. The map r defined in Example 3 is rx = 2x mod 1. Let m 0 = 1, m 1 = χ [0,1/2)χ [1/2,1). It is easy to see that {m 0, m 1} is a QMF basis. Therefore, S 0, S 1, defined as in Proposition 5, form a representation of the Cuntz algebra $${\mathcal{O}}_2$$.

### Proposition 12

Dutkay et al. (2012) The set ℰ := {S w 1 : w ∈ {0, 1}*} is an orthonormal basis for L 2[0, 1], the Walsh basis.

### Proof

We check the conditions in Theorem 1. To see that (i) holds, note that S 01 = 1. Define f(t) = e t , t ∈ ℝ. (ii) is clear. For (iii), we compute
$$\begin{array}{l}{S}_1^{*}{e}_t(x)=\frac{1}{2}\left({e}^{2\pi it\cdot x/2}+{e}^{2\pi it\cdot \left(x+1\right)/2}\right)={e}^{2\pi it\cdot x/2}\frac{1}{2}\left(1+{e}^{2\pi it/2}\right)\\ {}{S}_1^{*}{e}_t(x)=\frac{1}{2}\left({e}^{2\pi it\cdot x/2}-{e}^{2\pi it\cdot \left(x+1\right)/2}\right)={e}^{2\pi it\cdot x/2}\frac{1}{2}\left(1-{e}^{2\pi it/2}\right)\end{array}$$

Thus, (iii) holds with $${\mathfrak{m}}_0(t)=\frac{1}{2}\left(1+{e}^{2\pi it/2}\right)$$, $${\mathfrak{m}}_1(t)=\frac{1}{2}\left(1-{e}^{2\pi it/2}\right)$$, $${g}_0(t)={g}_1(t)=\frac{t}{2}$$. Since e 0 = 1, it follows that (iv) holds.

For (v), take h continuous on ℝ, 0 ≤ h ≤ 1, h(c) = 1, for all c ∈ ℝ with $${e}_t\in \overline{\mathrm{span}}\mathrm{\mathcal{E}}$$, in particular h(0) = 1 and
$$h(t)={\left|\frac{1}{2}\left(1+{e}^{2\pi it/2}\right)\right|}^2h\left(t/2\right)+{\left|\frac{1}{2}\left(1-{e}^{2\pi it/2}\right)\right|}^2h\left(t/2\right)=h\left(t/2\right)$$
Then h(t) = h(t/2 n ) for all t ∈ ℝ, n ∈ ℕ. Letting n and using the continuity of h, we get h(t) = h(0) = 1 for all t ∈ ℝ. Since all conditions hold, we get that ℰ is an orthonormal basis. That ℰ is actually the Walsh basis follows from the following calculations: for |w| = n in {0, 1}*, let $$n={\displaystyle \sum_i}{x}_i{2}^i$$ be the base 2 expansion of n. Because S 0 f = fr, S 1 f = m 1 fr, and m 0 ≡ 1, we obtain the following decomposition:
$${S}_w1(x)={m}_1\left({r}^{i_1}x\right)\cdot {m}_1\left({r}^{i_2}x\right)\cdots {m}_1\left({r}^{i_k}x\right),\kern.5em \mathrm{where}\kern0.5em {i}_1,{i}_2,\dots {i}_k\kern0.5em \mathrm{correspond}\kern0.5em \mathrm{to}\kern0.5em \mathrm{those}\kern0.5em i\kern0.5em \mathrm{with}\kern0.5em {x}_i=1$$

Also m 1(r i x) = m 1(2 i x mod i) are the Rademacher functions, and thus, we obtain the Walsh basis (see, e.g., Schipp et al. 1990).

The Walsh bases can be easily generalized by replacing the matrix
$$\frac{1}{\sqrt{2}}\left(\begin{array}{cc}\hfill 1\hfill & \hfill 1\hfill \\ {}\hfill 1\hfill & \hfill -1\hfill \end{array}\right)$$
which appears in the definition of the filters m 0, m 1, with an arbitrary unitary matrix A with constant first row and by changing the scale from 2 to N.

### Theorem 13

Let N ∈ ℕ, N ≥ 2. Let A = [a ij ] be an N × N unitary matrix whose first row is constant $$\frac{1}{\sqrt{N}}$$ . Consider the IFS $${\tau}_jx=\frac{x+j}{N},x\in \mathrm{\mathbb{R}},j=0,\dots, N-1$$ with the attractor [0, 1] and invariant measure, the Lebesgue measure, on [0, 1]. Define
$${m}_i(x)=\sqrt{N}{\displaystyle \sum_{j=0}^{N-1}}{a}_{ij}{\chi}_{\left[j/N,\left(j+1\right)/N\right]}(x)$$

Then {m i } i = 0 N − 1 is a QMF basis. Consider the associated representation of the Cuntz algebra $${\mathcal{O}}_N$$ . Then the set ℰ := {S w 1 : w ∈ {0, … N − 1}*} is an orthonormal basis for L 2[0, 1].

### Proof

We check the conditions in Theorem 1. Let f(t) = e t , t ∈ ℝ.

To check (i), note that S 01 ≡ 1. (ii) is clear. For (iii), we compute
$${S}_k^{*}{e}_t=\frac{1}{N}{\displaystyle \sum_{j=0}^{N-1}}\overline{m_k}\left({\tau}_jx\right){e}_t\left({\tau}_jx\right)=\frac{1}{\sqrt{N}}{\displaystyle \sum_{j=0}^{N-1}}\overline{a_{kj}}{e}^{2\pi it\cdot \left(x+j\right)/N}={e}^{2\pi it\cdot x/N}\frac{1}{\sqrt{N}}{\displaystyle \sum_{j=0}^{N-1}}\overline{a_{kj}}{e}^{2\pi it\cdot j/N}$$

So (iii) is true with $${\mathfrak{m}}_k(t)=\frac{1}{\sqrt{N}}{\displaystyle \sum_{j=0}^{N-1}}\overline{a_{kj}}{e}^{2\pi it\cdot j/N}$$ and $${g}_k(t)=\frac{t}{N}$$.

(iv) is true with c 0 = 0. For (v), take $$h\in \mathcal{C}\left(\mathrm{\mathbb{R}}\right),0\le h\le 1,h(c)=1$$, for all c ∈ ℝ with $${e}_c\in \overline{\mathrm{span}}\mathrm{\mathcal{E}}$$ (in particular h(0) = 1), and
$$h(t)={\displaystyle \sum_{k=0}^{N-1}}\left|{\mathfrak{m}}_k(t)\right|{}^2h\left(t/N\right)=h\left(t/N\right){\displaystyle \sum_{k=0}^{N-1}}\frac{1}{N}\left|{\displaystyle \sum_{j=0}^{N-1}}{a}_{kj}{e}^{-2\pi it\cdot j/N}\right|{}^2=h\left(t/N\right)\cdot \frac{1}{N}\left|\right| Av\left|\right|{}^2$$

Where v = (e − 2πitj/N ) j = 0 N − 1 . Since A is unitary, ||Av||2 = ||v||2 = N. Then h(t) = h(t/N n ). Letting n and using the continuity of h, we obtain that h(t) = 1 for all t ∈ ℝ. Thus, Theorem 1 implies that ℰ is an orthonormal basis.

### Remark 14

Dutkay et al. (2012) We can read the constants that appear in the step function S w 1 from the tensor of A with itself n times, where n is the length of the word w.

Let A be an N × N matrix and B an M × M matrix. Then AB has entries:
$$\begin{array}{l}{\left(A\otimes B\right)}_{i_1+M{i}_2,{j}_1+M{j}_2}={a}_{i_1{j}_1}{b}_{i_2{j}_2},\kern1em {i}_1,{j}_1=0,\dots, N-1,\kern0.5em {i}_2,{j}_2=0,\dots, M-1\\ {}\kern1.32em A\otimes B=\left(\begin{array}{cccc}\hfill A{b}_{0,0}\hfill & \hfill A{b}_{0,1}\hfill & \hfill \cdots \hfill & \hfill A{b}_{0,M-1}\hfill \\ {}\hfill A{b}_{1,0}\hfill & \hfill A{b}_{1,1}\hfill & \hfill \cdots \hfill & \hfill A{b}_{1,M-1}\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \ddots \hfill & \hfill \vdots \hfill \\ {}\hfill A{b}_{M-1,0}\hfill & \hfill A{b}_{M-1,1}\hfill & \hfill \cdots \hfill & \hfill A{b}_{M-1,M-1}\hfill \end{array}\right)\end{array}$$

The matrix A n is obtained by induction, tensoring to the left: A n = AA ⊗ (n−1).

Thus, AAA ⊗ … ⊗ A, n times, has entries
$${A}_{i_0+N{i}_1+{N}^2{i}_2+\dots +{N}^{n-1}{i}_{n-1},{j}_0+N{j}_1+\dots +{N}^{n-1}{j}_{n-1}}^{\otimes n}={a}_{i_0{j}_0}{a}_{i_1{j}_1}\dots {a}_{i_{n-1}{j}_{n-1}}$$
Now compute for i 0, … i n−1 ∈ {0, …, N − 1}:
$${S}_{i_0\dots {i}_{n-1}}1(x)={m}_{i_0}(x){m}_{i_1}(rx)\dots {m}_{i_{n-1}}\left({r}^{n-1}x\right)$$

Suppose $$x\in \left[\frac{k}{N^n},\frac{k+1}{N^n}\right),0\le k<{N}^n$$ and k = N n − 1 j 0 + N n − 2 j 1 + … + Nj n−2 + j n−1, where 0 ≤ j 0, …, j n − 1 < N.

Then $$x\in \left[\frac{j_0}{N},\frac{j_0+1}{N}\right)$$, $$rx=(Nx) \mod 1\in \left[\frac{j_1}{N},\frac{j_1+1}{N}\right),\dots, {r}^{n-1}x=\left({N}^{n-1}x\right) \mod 1\in \left[\frac{j_{n-1}}{N},\frac{j_{n-1}+1}{N}\right)$$, so $${m}_{i_0}(x)=\sqrt{N}{a}_{i_0{j}_0}$$, $${m}_{i_1}(rx)=\sqrt{N}{a}_{i_1{j}_1},\dots, {m}_{i_{n-1}}\left({r}^{n-1}x\right)=\sqrt{N}{a}_{i_{n-1}{j}_{n-1}}$$; hence,
$${S}_{i_0\dots {i}_{n-1}}1(x)=\sqrt{N^n}{a}_{i_0{j}_0}\dots {a}_{i_{n-1}{j}_{n-1}}=\sqrt{N^n}{A}_{i_0+N{i}_1+{N}^2{i}_2+\dots +{N}^{n-1}{i}_{n-1},j0+N{j}_1+\dots +{N}^{n-1}{j}_{n-1}}^{\otimes n}$$

### Example 15

Dutkay et al. (2012) The pictures in Fig. 8 show the Walsh functions that correspond to the scale N = 4 and the matrix
$$A=\left(\begin{array}{cccc}\hfill \frac{1}{2}\hfill & \hfill \frac{1}{2}\hfill & \hfill \frac{1}{2}\hfill & \hfill \frac{1}{2}\hfill \\ {}\hfill \frac{\sqrt{2}}{2}\hfill & \hfill -\frac{\sqrt{2}}{2}\hfill & \hfill 0\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill 0\hfill & \hfill \frac{\sqrt{2}}{2}\hfill & \hfill -\frac{\sqrt{2}}{2}\hfill \\ {}\hfill \frac{1}{2}\hfill & \hfill \frac{1}{2}\hfill & \hfill -\frac{1}{2}\hfill & \hfill -\frac{1}{2}\hfill \end{array}\right)$$
for the words of length 2, indicated at the top.

## List of Names and Discoveries

Many of the main discoveries summarized below are now lore.
 1807 Expressing functions as sums of sine and cosine waves of frequencies in arithmetic progression (now called Fourier series) Jean Baptiste Joseph Fourier: mathematics, physics (heat conduction) 1909 Discovered, while a student of David Hilbert, an orthonormal basis consisting of step functions, applicable both to functions on an interval and functions on the whole real line. While it was not realized at the time, Haar’s construction was a precursor of what is now known as the Mallat subdivision and multiresolution method, as well as the subdivision wavelet algorithms Alfred Haar: mathematics 1946 Discovered basis expansions for what might now be called time frequency wavelets, as opposed to time-scale wavelets Denes Gabor (Nobel Prize): physics (optics, holography) 1948 A rigorous formula used by the phone company for sampling speech signals. Quantizing information and entropy and founder of what is now called the mathematical theory of communication Claude Elwood Shannon: mathematics, engineering (information theory) 1976 Discovered subband coding of digital transmission of speech signals over the telephone Claude Garland, Daniel Esteban (both): signal processing 1981 Suggested the term “ondelettes.” J. M. decomposed reflected seismic signals into sums of “wavelets (Fr. ondelettes) of constant shape,” i.e., a decomposition of signals into wavelet shapes, selected from a library of such shapes (now called wavelet series). Received somewhat late recognition for his work. Due to contributions by A. Grossman and Y. Meyer, Morlet’s discoveries have now come to play a central role in the theory Jean Morlet: petroleum engineer 1985 Mentor for A. Cohen, S. Mallat, and others of the wavelet pioneers, Y. M. discovered infinitely often differentiable wavelets Yves Meyer: mathematics, applications 1989 Discovered the use of wavelet filters in the analysis of wavelets – the so-called Cohen condition for orthogonality Albert Cohen: mathematics (orthogonality relations), numerical analysis 1986 Discovered what is now known as the subdivision and multiresolution method, as well as the subdivision wavelet algorithms. This allowed the effective use of operators in the Hilbert space L2(R) and of the parallel computational use of recursive matrix algorithms Stephane Mallat: mathematics, signal and image processing 1987 Discovered differentiable wavelets, with the number of derivatives roughly half the length of the support interval. Further found polynomial algorithmic for their construction (with coauthor Jeff Lagarias, joint spectral radius formulas) Ingrid Daubechies: mathematics, physics, and communications 1991 Discovered the use of a transfer operator in the analysis of wavelets: orthogonality and smoothness Wayne Lawton: mathematics (the wavelet transfer operator) 1992 C. Brislawn and his group at Los Alamos created the theory and the codes which allowed the compression of the enormous FBI fingerprint file, creating A/D, a new database of fingerprints The FBI using wavelet algorithms in digitizing and compressing fingerprints 2000 A wavelet-based picture compression standard, called JPEG 2000, for digital encoding of images The International Standards Organization 1994 Pioneered the use of wavelet bases and tools from statistics to “denoise” images and signals David Donoho: statistics, mathematics

## History

While wavelets as they have appeared in the mathematics literature (e.g., Daubechies 1992) for a long time, starting with Haar in 1909, involve function spaces, the connections to a host of discrete problems from engineering are more subtle. Moreover, the deeper connections between the discrete algorithms and the function spaces of mathematical analysis are of a more recent vintage; see, e.g., Strang and Nguyen (1996) and Jorgensen (2006a).

Here we begin with the function spaces. This part of wavelet theory refers to continuous wavelet transforms (details below). It dominated the wavelet literature in the 1980s and is beautifully treated in the first four chapters in Daubechies (1992)) and in Daubechies (1993). The word “continuous” refers to the continuum of the real line ℝ. Here we consider spaces of functions in one or more real dimensions, i.e., functions on the line ℝ (signals), the plane ℝ2 (images), or, in higher dimensions ℝ d , functions of d real variables.

## Literature

As evidenced by a simple Google check, the mathematical wavelet literature is gigantic in size, and the manifold applications spread over a vast number of engineering journals. While we cannot do justice to this volumest literature, we instead offer a collection of the classics (Heil and Walnut 2006) edited recently by C. Heil et al.

## Notes

### Acknowledgments

We thank Professors Dorin Dutkay, Gabriel Picioroaga, and Judy Packer for the helpful discussions.

## Bibliography

1. Aubert G, Kornprobst P (2006) Mathematical problems in image processing. Springer, New York
2. Baggett L, Jorgensen P, Merrill K, Packer J (2005) A non-MRA Cr frame wavelet with rapid decay. Acta Appl Math 89:251–270
3. Baladi V (2000) Positive transfer operators and decay of correlations, vol 16, Advanced series in nonlinear dynamics. World Scientific, River Edge
4. Bratelli O, Jorgensen P (2002) Wavelets through a looking glass: the world of the spectrum. Birkhäuser, Boston
5. Braverman M (2006) Parabolic Julia sets are polynomial time computable. Nonlinearity 19(6):1383–1401
6. Braverman M, Yampolsky M (2006) Non-computable Julia sets. J Am Math Soc 19(3):551–578 (electronic)
7. Bredies K, Lorenz DA, Maass P (2006) An optimal control problem in medical image processingGoogle Scholar
8. Daubechies I (1992) Ten lectures on wavelets, vol 61, CBMS-NSF regional conference series in applied mathematics. Society for Industrial and Applied Mathematics, Philadelphia
9. Daubechies I (1993) Wavelet transforms and orthonormal wavelet bases. Proc Sympos Appl MathGoogle Scholar
10. Devaney RL, Look DM (2006) A criterion for Sierpinski curve Julia sets. Topol Proc 30(1):163–179, Spring topology and dynamical systems conference
11. Devaney RL, Rocha MM, Siegmund S (2007) Rational maps with generalized Sierpinski gasket Julia sets. Topol Appl 154(1):11–27
12. Dutkay DE (2004) The spectrum of the wavelet Galerkin operator. Integral Equ Oper Theory 50:477–487
13. Dutkay DE, Jorgensen PET (2005) Wavelet constructions in non-linear dynamics. Electron Res Announc Am Math Soc 11:21–23
14. Dutkay DE, Jorgensen PET (2006a) Wavelets on fractals. Rev Mat Iberoamericana 22:131–180
15. Dutkay DE, Jorgensen PET (2006b) Hilbert spaces built on a similarity and on dynamical renormalization. J Math Phys 47:053504
16. Dutkay DE, Jorgensen PET (2006c) Iterated function systems, Ruelle operators, and invariant projective measures. Math Comput 75:1931
17. DE Dutkay, K Roysland (2007) The algebra of harmonic functions for a matrix-valued transfer operator. arXiv:math/0611539Google Scholar
18. Dutkay DE, Roysland K (2007) Covariant representations for matrix-valued transfer operators. arXiv:math/0701453Google Scholar
19. Dutkay DE, Picioroaga G, M-S Song (2012) Orthonormal bases generated by Cuntz algebras. arXiv:1212.4134Google Scholar
20. Heil C, Walnut DF (eds) (2006) Fundamental papers in wavelet theory. Princeton University Press, Princeton
21. Hutchinson JE (1981) Fractals and self-similarity. Indiana Univ Math J 30(5):713–747
22. Daubechies I, Lagarias JC (1992) Two-scale difference equations. II. Local regularity, infinite products of matrices and fractals. SIAM J Math AnalGoogle Scholar
23. Jorgensen PET (2003) Matrix factorizations, algorithms, wavelets. Not Am Math Soc 50:880–895
24. Jorgensen PET (2006a) Analysis and probability: wavelets, signals, fractals, vol 234, Graduate texts in mathematics. Springer, New YorkGoogle Scholar
25. Jorgensen T (2006b) Certain representations of the Cuntz relations, and a question on wavelets decompositions. Contemp Math 414:165–188
26. Liu F (2006) Diffusion filtering in image processing based on wavelet transform. Sci China Ser F 49:1–25
27. Milnor J (2004) Pasting together Julia sets: a worked out example of mating. Exp Math 13(1):55–92
28. Petersen CL, Zakeri S (2004) On the Julia set of a typical quadratic polynomial with a Siegel disk. Ann Math 159(1):1–52
29. Schipp F, Wade WR, Simon P (1990) Walsh series. Adam Hilger Ltd., Bristol. An introduction to dyadic harmonic analysis, With the collaboration of J. PálGoogle Scholar
30. Skodras A, Christopoulos C, Ebrahimi T (2001) Jpeg 2000 still image compression standard. IEEE Signal Process Mag 18:36–58
31. Song M-S (2006) Wavelet image compression. PhD thesis, The University of IowaGoogle Scholar
32. Song M-S (2006b) Wavelet image compression. In: Operator theory, operator algebras, and applications, vol 414, Contemporary mathematics. American Mathematical Society, Providence, pp 41–73
33. Strang G (1997) Wavelets from filter banks. Springer, New YorkGoogle Scholar
34. Strang G (2000) Signal processing for everyone. Lecture notes in mathematics, Springer, vol 1739Google Scholar
35. Strang G, Nguyen T (1996) Wavelets and filter banks. Wellesley-Cambridge Press, Wellesley
36. Usevitch BE (2001) A tutorial on modern lossy wavelet image compression: foundations of JPEG 2000. IEEE Signal Process Mag 18:22–35
37. Walker JS (1999) A primer on wavelets and their scientific applications. Chapman & Hall/CRC, Boca Raton