eGHWT: The Extended Generalized Haar-Walsh Transform

Extending computational harmonic analysis tools from the classical setting of regular lattices to the more general setting of graphs and networks is very important and much research has been done recently. The Generalized Haar-Walsh Transform (GHWT) developed by Irion and Saito (2014) is a multiscale transform for signals on graphs, which is a generalization of the classical Haar and Walsh-Hadamard Transforms. We propose the extended Generalized Haar-Walsh Transform (eGHWT), which is a generalization of the adapted time-frequency tilings of Thiele and Villemoes (1996). The eGHWT examines not only the efficiency of graph-domain partitions but also that of"sequency-domain"partitions simultaneously. Consequently, the eGHWT and its associated best-basis selection algorithm for graph signals significantly improve the performance of the previous GHWT with the similar computational cost, $O(N \log N)$, where $N$ is the number of nodes of an input graph. While the GHWT best-basis algorithm seeks the most suitable orthonormal basis for a given task among more than $(1.5)^N$ possible orthonormal bases in $\mathbb{R}^N$, the eGHWT best-basis algorithm can find a better one by searching through more than $0.618\cdot(1.84)^N$ possible orthonormal bases in $\mathbb{R}^N$. This article describes the details of the eGHWT best-basis algorithm and demonstrates its superiority using several examples including genuine graph signals as well as conventional digital images viewed as graph signals. Furthermore, we also show how the eGHWT can be extended to 2D signals and matrix-form data by viewing them as a tensor product of graphs generated from their columns and rows and demonstrate its effectiveness on applications such as image approximation.


Introduction
In recent years, research on graphs and networks is experiencing rapid growth due to a con uence of several trends in science and technology: the advent of new sensors, measurement technologies, and social network infrastructure has provided huge opportunities to visualize complicated interconnected network structures, record data of interest at various locations in such networks, analyze such data, and make inferences and diagnostics.We can easily observe such networkbased problems in truly diverse elds: biology and medicine (e.g., connectomes); computer science (e.g., social networks); electrical engineering (e.g., sensor networks); hydrology and geology (e.g., rami ed river networks); and civil engineering (e.g., road networks), to name just a few.Consequently, there is an explosion of interest and demand to analyze data sampled on such graphs and networks, which are often called "network data analysis" or "graph signal processing"; see e.g., recent books [3,8,25,30] and survey articles [32,45], to see the evidence of this trend.This trend has forced the signal processing and applied mathematics communities to extend classical techniques on regular domains to the setting of graphs.Much e orts have been done to develop wavelet transforms for signals on graphs (or the so-called graph signals) [2,5,6,12,19,22,28,36,47,51].Comprehensive reviews of transforms for signals on graphs have also been written [32,45].
The Generalized Haar-Walsh Transform (GHWT) [14,15,18], developed by Irion and Saito, has achieved superior results over other transforms in terms of both approximation and denoising of signals on graphs (or graph signals for short).It is a generalization of the classical Haar and Walsh-Hadamard Transforms.In this article, we propose and develop the extended Generalized Haar-Walsh Transform (eGHWT).The eGHWT and its associated best-basis selection algorithm for graph signals signi cantly improve the performance of the previous GHWT with the similar computational cost, ( log ) where is the number of nodes of an input graph.While the previous GHWT best-basis algorithm seeks the most suitable orthonormal basis (ONB) for a given task among more than (1.5) possible orthonormal bases in ℝ , the eGHWT best-basis algorithm can nd a better one by searching through more than 0.618⋅(1.84)possible orthonormal bases in ℝ .It can be extended to 2D signals and matrix-form data in a more subtle way than the GHWT.In this article, we describe the details of the eGHWT basis-basis algorithm and demonstrate its superiority.Moreover, we showcase the versatility of the eGHWT by applying it to genuine graph signals and classical digital images.
The organization of this article is as follows.In Sect.2, we review background concepts, including graph signal processing and recursive graph partitioning, which is a common strategy used by researches to develop graph signal transforms.In Sect.3, the GHWT and its best-basis algorithm are reviewed.In Sect.4, we provide an overview of the eGHWT.We start by reviewing the algorithm developed by [49].Then we illustrate how that algorithm can be modi ed to construct the eGHWT.A simple example illustrating the di erence between the GHWT and the eGHWT is given.We nish the section by explaining how the eGHWT can be extended to 2D signals and matrix-form data.In Sect.5, we demonstrate the superiority of the eGHWT over the GHWT (including the graph Haar and Walsh bases) using real datasets.Section 6 concludes this article and discusses potential future projects.
We note that the most of the gures in this article are reproducible.The interested readers can nd our scripts to generate the gures (written in the Julia programming language [1]) at our software website: https://github.com/UCD4IDS/MultiscaleGraphSignalTransforms.jl; in particular, see its subfolder, test/paperscripts/eGHWT2021.

Basics of Spectral Graph Theory and Notation
In this section, we review some fundamentals of spectral graph theory and introduce the notation that will be used throughout this article.
A graph is a pair = ( , ), where = ( ) = { 1 , 2 , … , } is the vertex (or node) set of , and = ( ) = { 1 , 2 , … , } is the edge set, where each edge connects two nodes , for some 1 ≤ ≠ ≤ .We only deal with nite and in this article.For simplicity, we often write instead of .An edge connecting a node and itself is called a loop.If there exists more than one edge connecting some , , then they are called multiple edges.A graph having loops or multiple edges is called a multiple graph (or multigraph); a graph with neither of these is called a simple graph.A directed graph is a graph in which edges have orientations while undirected graph is a graph in which edges do not have orientations.If each edge ∈ has a weight (normally nonnegative), then is called a weighted graph.A path from to in a graph is a subgraph of consisting of a sequence of distinct nodes starting with and ending with such that consecutive nodes are adjacent.A path starting from that returns to (but is not a loop) is called a cycle.For any two distinct nodes in , if there is a path connecting them, then such a graph is said to be connected.In this article, we mainly consider undirected weighted simple connected graphs.Our method can be easily adapted to other undirected graphs, but we do not consider directed graphs here.
Sometimes, each node is associated with spatial coordinates in ℝ .For example, if we want to analyze a network of sensors and build a graph whose nodes represent the sensors under consideration, then these nodes have spatial coordinates in ℝ 2 or ℝ 3 indicating their current locations.In that case, we write [ ] ∈ ℝ for the location of node .Denote the functions supported on graph as = ( [1], … , [ ]) ∈ ℝ .It is a data vector (often called a graph signal) where [ ] ∈ ℝ is the value measured at the node of the graph.We now discuss several matrices associated with undirected simple graphs.The information in both and is captured by the edge weight matrix ( ) ∈ ℝ × , where ≥ 0 is the edge weight between nodes and .In an unweighted graph, this is restricted to be either 0 or 1, depending on whether nodes and are adjacent, and we may refer to ( ) as an adjacency matrix.In a weighted graph, indicates the a nity between and .In either case, since is undirected, ( ) is a symmetric matrix.We then de ne the degree matrix With this in place, we are now able to de ne the (unnormalized) Laplacian matrix, random-walk normalized Laplacian matrix, and symmetric normalized Laplacian matrix respectively, as See [26] for the details of the relationship between these three matrices and their spectral properties.We use 0 = 0 < 1 ≤ … ≤ −1 to denote the sorted Laplacian eigenvalues and 0 , 1 , … , −1 to denote the corresponding eigenvectors, where the speci c Laplacian matrix to which they refer will be clear from either context or subscripts.
Laplacian eigenvectors can then be used for graph partitioning.Spectral clustering [26] performs -means on the rst few eigenvector coordinates to partition the graph.This approach is justi ed from the fact that it is an approximate minimizer of the graph-cut criterion called Ratio Cut [11] (or the Normalized Cut [44]) when (or rw , respectively) is used.Suppose the nodes in ( ) is partitioned into two disjoint sets and , then indicates the quality of the cut (the smaller this quantity, the better the cut in general), vol( ) ∶= ∑ ∈ is the so-called volume of the set , and is the cardinality of (i.e., the number of nodes in) .To reduce the computational complexity (as we did for the GHWT construction), we only use the Fiedler vector [9], i.e., the eigenvector 1 corresponding to the smallest nonzero eigenvalue 1 , to bipartition a given graph (or subgraph) in this article.For a connected graph , Fiedler showed that its Fiedler vector partitions the vertices into two sets by letting such that the subgraphs induced on 1 and 2 by are both connected graphs [9].
In this article, we use the Fiedler vector of rw of a given graph and its subgraphs unless stated otherwise.See e.g., [26], which suggests the use of the Fiedler vector of rw for spectral clustering over that of the other Laplacian matrices.

Recursive Partitioning of Graphs
The foundation upon which the GHWT and the eGHWT are constructed is a binary partition tree (also known as a hierarchical bipartition tree) of an input graph ( , ): a set of tree-structured subgraphs of constructed by recursively bipartitioning .This bipartitioning operation ideally splits each subgraph into two smaller subgraphs that are roughly equal in size while keeping tightly-connected nodes grouped together.More speci cally, let denote the th subgraph on level of the binary partition tree of and ∶= ( ) , where , ∈ ℤ ≥0 .Note 0 0 = , 0 0 = , i.e., level = 0 represents the root node of this tree.Then the two children of in the tree, +1 and +1 +1 , are obtained through partitioning using the Fiedler vector of rw ( ).The graph partitioning is recursively performed until each subgraph corresponding to the leaf contains only one node.Note that = 2 if the resulting binary partition tree is a perfect binary tree.
In general, other spectral clustering methods with di erent number of eigenvectors or di erent Laplacian matrices are applicable as well.However, we impose the following ve conditions on the binary partition tree: 1.The root of the tree is the entire graph, i.e., 0 0 = ; 2. The leaves of the tree are single-node graphs, i.e., max = 1, where max is the height of the tree; 3.All regions (i.e., nodes in the subgraphs) on the same level are disjoint, i.e., ( ) ∩ ( ) = ∅ if ≠ ; 4. Each subgraph with more than one node is partitioned into exactly two subgraphs; 5. (Optional) In practice, the size of the two children, +1 and +1 +1 should not be too far apart to reduce ine ciency.
Even other (non-spectral) graph cut methods can be used to form the binary partition tree, as long as those conditions are satis ed.The exibility of a choice of graph partitioning methods in the GHWT/eGHWT construction is certainly advantageous.
We demonstrate two examples illustrating the binary partition tree here.The rst one is a simple 6-node path graph, 6 .It has ve edges with equal weights connecting adjacent nodes.Figure 1 is the binary partition tree formed on the graph.In the rst iteration, it is bipartitioned into two subgraphs with 3 nodes each.Then each of those 3 node graphs is bipartitioned into a 2-node graph and an 1-node graph.In the end, the subgraphs are all 1-node graph.
The second example is the street network of the City of Toronto, which we obtained from its open data portal 1 .Using the street names and intersection coordinates included in the dataset, we constructed the graph representing the street network there with = 2275 nodes and = 3381 edges.Each edge weight was set as the reciprocal of the Euclidean distance between the endpoints of that edge.Fig. 1: An example of a hierarchical bipartition tree for a path graph with = 6 nodes, where the edge weights are equal.The root is the whole graph.Fig. 2: A demonstration of hierarchical bipartition tree on the Toronto street network.On the same level, di erent colors correspond to di erent regions.The situation at level = 0 is not shown since there is no partition at = 0.
Figure 2 gives us a visualization of the rst three levels of the binary partition tree on this Toronto street network.

The Generalized Haar-Walsh Transform (GHWT)
In this section, we will review the Generalized Haar-Walsh Transform (GHWT) [14,15,18].It is a multiscale transform for graph signals and a true generalization of the classical Haar and Walsh-Hadamard transforms: if an input graph is a simple path graph whose number of nodes is dyadic, then the GHWT reduces to the classical counterpart exactly.

Overcomplete Dictionaries of Bases
After the binary partition tree of the input graph with depth max is generated, an overcomplete dictionary of basis vectors is composed.Each basis vector is denoted as , , where ∈ [0, max ] denotes the level, ∈ [0, ) denotes the region, and denotes the tag. is the number of subgraphs of on level .The tag assumes a distinct integer value within the range [0, 2 max − ).The tag , when expressed in binary, speci es the sequence of average and di erence operations by which , was generated.For example, = 6 written in binary is 110, which means that the basis vector/expansion coe cient was produced by two di erencing operations (two 1s) followed by an averaging operation (one 0).Generally speaking, a larger value of indicates more oscillations in , , with exceptions when imbalances occur in the recursive partitioning.We refer to basis vectors with tag = 0 as scaling vectors, those with tag = 1 as Haar vectors, and those with tag ≥ 2 as Walsh vectors.
The GHWT begins by de ning an orthonormal basis on level max and obtaining the corresponding expansion coe cients.The standard basis of ℝ is used here since each region at level max is a 1-node graph: max ,0 ∶= ( max ) ∈ ℝ , where ∈ [0, ), max = 1, and is the indicator vector of node , i.e., [ ] = 0 if ≠ and [ ] = 1.The expansion coe cients { max ,0 } =0∶ −1 are then simply the reordered input signal .From here the algorithm proceeds recursively, and the basis vectors and expansion coe cients on level −1 are computed from those on level .The GHWT proceeds as in Algorithm 1.
Note that when analyzing the input signal , we only need to compute the expansion coe cients without generating the basis vectors in Algorithm 1 in general.
For the dictionary of basis vectors, several remarks are in order.
1.The basis vectors on each level are localized.In other words, , is supported on ( ).If ( ) ∩ ( ) = ∅, then the basis vectors { , } and { , } are mutually orthogonal.2. The basis vectors on ( ) span the same subspace as the union of those on ). 3. The depth of the dictionary is the same as the binary partition tree, which is approximately (log ) if the tree is nearly balanced.There are vectors on each level, so the total number of basis vectors is approximately ( log ).
Note that Algorithm 1 groups the GHWT basis vectors by region (i.e., the index ) and arranges them from the coarse scale to the ne scale, which we call the coarse-to-ne (c2f) dictionary.Alternatively, we can group them by tag (i.e., the index ) and reverse the order of the levels (i.e., scales), which we call the ne-tocoarse (f2c) dictionary [14,15,18].The c2f dictionary corresponds to a collection of basis vectors by recursively partitioning the "time" domain information of the input graph signal while the f2c dictionary corresponds to those by recursively partitioning the "frequency" (or "sequency") domain information of the input graph signal.Each dictionary contains more than (1.5) choosable ONBs; see, e.g., Thiele and Villemoes [49] for the details on how to compute or estimate this number.Note, however, that exceptions can occur when the recursive partitioning generates a highly imbalanced tree.Figure 3 shows these dictionaries for 6 .Figure 4 shows some basis vectors from the GHWT dictionary computed on the Toronto street network.
Algorithm 1: Generating expansion coe cients relative to the basis vectors in the GHWT basis dictionary [14,15,18] Input: A binary partition tree { } of the graph , 0 ≤ ≤ max and 0 ≤ < ; Input graph signal supported on ( ); a ag bvecq (default: false) to compute the GHWT basis vectors Output: The set of expansion coe cients { , } of signal on the overcomplete GHWT dictionary vectors { , }, which are also returned if bvecq is set to true // The algorithm starts here.for = 0 ∶ − The GHWT f2c dictionary Fig. 3: The GHWT c2f and f2c dictionaries on 6 .Stem plots with black, red, blue colors correspond to the scaling, Haar, and Walsh vectors, respectively.Fig. 4: Examples of basis vectors from the GHWT dictionary computed on Toronto street network.The color scheme called viridis [35] is used to represent the amplitude of these vectors ranging from deep violet (negative) to dark green (zero) to yellow (positive).

The Best-Basis Algorithm in the GHWT
To select an ONB from a dictionary of wavelet packets that is "best" for approximation/compression, we typically use the so-called best-basis algorithm.The one used for the GHWT in [14,15,18] was a straightforward generalization of the Coifman-Wickerhauser algorithm [7], which was developed for non-graph signals of dyadic length.This algorithm requires a real-valued cost function measuring the approximation/compression ine ciency of the subspaces in the dictionary, and aims to nd an ONB whose coe cients minimize , (i.e., the most e cient ONB for approximating/compressing an input signal), which we refer to as the "best basis" chosen from the GHWT dictionary.The algorithm initiates the best basis as the whole set of vectors at the bottom level of the dictionary.Then it proceeds upwards, comparing the cost of the expansion coe cients corresponding to two children subgraphs to the cost of those of their parent subgraph.The best basis is updated if the cost of the parent subgraph is smaller than that of its children subgraphs.The algorithm continues until it reaches the top (i.e., the root) of the binary partition tree (i.e., the dictionary).
The best-basis algorithm works as long as is nonnegative and additive2 of the form ( ) ∶= ∑ ( ) with ∶ ℝ → ℝ ≥0 , where is the expansion coecients of an input graph signal on a region.For example, if one wants to promote sparsity in graph signal representation or approximation, ( ) function can be chosen as: either the th power of -(quasi)norm ∑ for 0 < < 2 or the 0 -pseudonorm { ≠ 0} .Note that the smaller the value of is, the more emphasis in sparsity is placed.
Note that one can search the c2f and f2c dictionaries separately to obtain two sets of the best bases, among which the one with smaller cost is chosen as the nal best basis of the GHWT dictionaries.We also note here that the graph Haar basis is selectable only in the GHWT f2c dictionary while the graph Walsh basis is selectable in either dictionary.

The Extended GHWT (eGHWT)
In this section, we describe the extended GHWT (eGHWT): our new best-basis algorithm on the GHWT dictionaries, which simultaneously considers the "time" domain split and "frequency" (or "sequency") domain split of an input graph signal.This transform will allow us to deploy the modi ed best-basis algorithm that can select the best ONB for one's task (e.g., e cient approximation, denoising, etc.) among a much larger set (> 0.618 ⋅ (1.84) ) of ONBs than the GHWT c2f/f2c dictionaries could provide (> (1.5) ).The previous best-basis algorithm only searches through the c2f dictionary and f2c dictionary separately, but this new method makes use of those two dictionaries simultaneously.In fact, the performance of the eGHWT, by its construction, is always superior to that of the GHWT, which is clearly demonstrated in our numerical experiments in Section 5.
Our eGHWT is the graph version of the Thiele-Villemoes algorithm [49] that nds the best basis among the ONBs of ℝ consisting of discretized and rescaled Walsh functions for an 1D non-graph signal (i.e., a signal discretized on a 1D regular grid) of length , where must be a dyadic integer.Their algorithm operates in the time-frequency plane and constructs its tiling with minimal cost among all possible tilings with dyadic rectangles of area one, which clearly depends on the input signal.Here we adapt their method to our graph setting that does not require dyadic .In addition, the generalization of the Thiele-Villemoes algorithm for 2D signals developed by Lindberg and Villemoes [23] can be generalized to the 2D eGHWT, as we will discuss more in Sections 4.5 and 5.2.

Fast Adaptive Time-Frequency Tilings
In this subsection, we brie y review the Thiele-Villemoes algorithm [49].First of all, let us de ne the so-called Walsh system, which forms an ONB for 2 [0, 1).Let 0 ( ) = 1 for 0 ≤ < 1 and zero elsewhere, and de ne 1 , 2 , … recursively by Then { } ∞ =0 is an ONB for 2 [0, 1) and is referred to as the Walsh system in sequency order.Each basis function, ( ), is piecewise equal to either 1 or −1 on [0, 1).Note that the scaling and Haar vectors at the global scale are included in this Walsh system.
Viewing ∶= [0, 1)×[0, ∞) as a time-frequency plane, the tile corresponding to the rescaled and translated Walsh function (which we also refer to as the Haar-Walsh function), ( ) ∶= 2 ∕2 (2 − ), Note that the area of is 1.Thiele and Villemoes showed that the functions and are orthogonal if and only if the tiles and are disjoint.Moreover, if a collection of tiles form a disjoint covering of a given dyadic rectangle (i.e., a rectangle with dyadic sides) in the time-frequency plane, then the Haar-Walsh functions of those tiles form an ONB of the subspace corresponding to that dyadic rectangle.Now the 1D discrete signal space ℝ ( = 2 ) can be identi ed with the subset ∶= [0, 1)×[0, ) of in the time-frequency plane.Given the overcomplete dictionary of Haar-Walsh functions on ℝ , the best-basis algorithm now is equivalent to nding a set of basis vectors for a given input signal with minimal cost that also generates a disjoint tiling of .That tiling is called the minimizing tiling.This lemma tells us that the minimizing tiling of can be split either in the timedirection into two minimizing tilings of and , or in the frequency-direction into those of and .It enables a dynamic programming algorithm to nd the minimizing tiling of ; see Algorithm 3.3 of [49].

Relabeling Region Indices
If the input graph is a simple path graph with dyadic and the partition tree is a balanced complete binary tree, then the GHWT dictionary is the same as the classical Haar-Walsh wavelet packet dictionary for 1D regular signals, on which the Thiele-Villemoes algorithm [49] can be applied in a straightforward manner.To adapt the algorithm to a graph signal of an arbitrary length or an imperfect binary partition tree of an input graph, we need to modify the GHWT dictionary rst.Speci cally, the region index of and , needs to be relabeled.Previously, on level , the region index takes all the integer values in [0, ) where ≤ 2 is the total number of subgraphs (or regions) on level .After relabeling, takes an integer value in [0, 2 ) according to its location in the binary tree.Algorithm 2 precisely describes the whole procedure.A couple of remarks are in order.First, Algorithm 2: Relabeling the GHWT Dictionary Input: A binary partition tree denoted by { }, 0 ≤ ≤ max , 0 ≤ < Output: A perfect binary partition tree { } containing { } as its subset // The algorithm starts here.0 0 ← 0 0 // On level 0, there is only one region 0 0 , so no relabeling is required.
the region indices of the basis vectors { , }, 0 ≤ < are also relabeled accordingly as { , } supported on the subgraph , where 0 ≤ < 2 .Note that some of the basis vectors in { , } that do not have the corresponding basis vectors in the original GHWT dictionary { , } are " ctitious" (or "non-existent") ones and can be set as the zero vectors.In practice, however, we even do not need to store them as the zero vectors; we simply do not compute the cost corresponds to such ctitious basis vectors.Second, to simplify our notation, we just assume the { , } and { } are those already relabeled by Algorithm 2 in the rest of our article.Figure 5b shows the result of Algorithm 2 applied to the GHWT c2f dictionary shown in Fig. 5a on 6 .Before the relabeling, the dictionary forms a complete but imperfect binary tree.As one can see, after the relabeling, the initial GHWT c2f dictionary is a subset of a perfect binary tree shown in Fig. 5b.

The New Best-Basis (eGHWT) Algorithm
We can now apply Algorithm 3 to search for the best basis in the relabeled GHWT dictionary that have a perfect binary partition tree, similarly to the Thiele-Villemoes algorithm.For simplicity, we refer to this whole procedure as the eGHWT.In order to understand this algorithm, let us rst remark that we use the so-called associative array: an abstract data type composed of a collection of (key, value) pairs such that each possible key appears at most once in the collection.The reason why we use the associative arrays instead of the regular arrays is to save storage space while keeping the algorithm exible and e cient without losing the convenience of manipulating arrays.This point is important since many basis vectors , after relabeling via Algorithm 2 may be ctitious, which we need to neither store nor compute: using regular matrices to store them will be wasteful.For example, Algorithm 3 has a statement like 0 [ , , ] ← (⟨ , , ⟩), where 0 is an associative array.This actually means that (( , , ), (⟨ , , ⟩)) is a pair of (key, value) of the associative array 0 .Here we write ( , , ) ∈ 0 to denote that ( , , ) is a valid key of 0 .Therefore, ( , , ) ∈ 0 if and only if non-ctitious , exists.
Several remarks on this algorithm are in order: In other words, when we compute from −1 , we concatenate the subspaces.This process is similar to nding the best tilings for dyadic rectangles from those with half size in Thiele-Villemoes algorithm [49] as described in Sect.4.1.
-If the input graph is a simple 1D path with dyadic , and if we view an input graph signal as a discrete-time signal, then ( −1 [ , , 2 ], −1 [ , , 2 + 1]) corresponds to splitting the subspace of [ , , ] in the frequency domain in the time-frequency plane while ) does the split in the time domain.
-The subspace of each entry in 0 is one dimensional since it is spanned by a single basis vector.In other words, 0 [ , , ] corresponds to span{ , }: the one-dimensional subspace spanned by , .
max has only one entry max [0, 0, 0], which corresponds to the whole ℝ .Its value is the minimum cost among all the choosable ONBs, i.e., the cost of the best basis.
-If an input graph signal is of dyadic length, Algorithm 3 selects the best basis among a much larger set (> 0.618 ⋅ (1.84) ) of ONBs than what each of the GHWT c2f and f2c dictionaries would provide (> (1.5) ) [49].The numbers are similar even for non-dyadic cases as long as the partition trees are essentially balanced.The essence of this algorithm is that at each step of the recursive evaluation of the costs of subspaces, it compares the cost of the parent subspace with not only its two children subspaces partitioned in the "frequency" domain (as the GHWT f2c does), but also its two children subspaces partitioned in the "time" domain (as the GHWT c2f does).-If the underlying graph of an input graph signal is a simple unweighted path graph of dyadic length, i.e., 2 , ∈ ℕ, then Algorithm 3 reduces to the Thiele-Villemoes algorithm exactly.Note that in such a case, neither computing the Fiedler vectors of subgraphs nor relabeling the subgraphs via Algorithm 2, is necessary; it is more e cient to force the midpoint partition at each level explicitly in that case.

4.4
The eGHWT illustrated by a simple graph signal on 6 The easiest way to appreciate and understand the eGHWT algorithm is to use a simple example.Let = [2, −2, 1, 3, −1, −2] ∈ ℝ 6 be an example graph signal on = 6 .The 1 -norm is chosen as the cost function.
Figure 6a shows the coe cients of that signal on the GHWT c2f dictionary and Fig. 6b corresponds to the relabeled GHWT c2f dictionary by Algorithm 2. After the GHWT c2f for 6 is relabeled via Algorithm 2, the dictionary tree has the same The GHWT c2f dictionary after relabeling those correspond to the GHWT c2f dictionary after relabeling structure as that of 8 .Figure 7 illustrates the progression of Algorithm 3 on this simple graph signal.The time-frequency plane in this case is 3 = [0, 1) × [0, 8) and the frequency axis is scaled in this gure so that 3 is a square for visualization purpose.The collection of all 32 tiles and the corresponding expansion coe cients are placed on the four copies of 3 in the top row of Fig. 7, which are ordered from the nest time/coarsest frequency partition ( = 3) to the coarsest time/ nest frequency partition ( = 0).Note that each tile has the unit area.
In the rst iteration ( = 1) in Fig. 7, the minimizing tilings for all the dyadic rectangles of area 2 are computed from those with unit area.For example, the left most dyadic rectangle (showing its cost 2 √ 2) at = 1, can be composed by either a pair of tiles ( 3 0,0 , 3 1,0 ) or ( 2 0,0 , 2 0,1 ).The corresponding costs are 2 + − 2 = 4 or 2 √ 2 + 0 = 2 √ 2, and the minimum of which is 2 √ 2. The minimizing tiling for that dyadic rectangle is hence 2 0,0 and 2 0,1 with its cost 2 √ 2. In the second iteration ( = 2), the algorithm nds the minimizing tilings for all dyadic rectangles with area 4. In the third iteration ( = 3), the algorithm nds the minimizing tiling for the whole 3 .After the best basis is found in the relabeled GHWT c2f dictionary, we remove those " ctitious" zero basis vectors and undo the relabeling done by Algorithm 2 to get back to the original ( , , ) indices.
Figure 8 shows that the GHWT c2f best basis for this simple graph signal is actually the Walsh basis, and its representation is 1,1 + 4 0 0,4 + 0 0 0,5 with its cost ≈ 7.84 while Fig. 9 demonstrates that the best basis representation chosen by the eGHWT algorithm is 0 2 0,0 + 1 2 1,0 + 0 1 1,0 + √ 6 1  1,1 + 4 0 0,4 + 0 0 0,5 with its cost ≈ 7.45, which is the smallest among these three best basis representations.The indices used here are those of the original ones.Figure 9 clearly demonstrates that the eGHWT best basis cannot be obtained by simply applying the previous GHWT best basis algorithm on the c2f and f2c dictionaries.More speci cally, let us consider the vectors 1  1,0 and 0 0,4 in the eGHWT best basis.From Fig. 9a, we can see that 1  1,0 is supported on the child graph 1 1 that was generated by bipartitioning the input graph 0 0 where 0 0,4 is supported.Therefore, they cannot be selected in the GHWT c2f best basis simultaneously.A The eGHWT best basis shown in the GHWT f2c dictionary Fig. 9: The eGHWT best-basis vectors for the signal = [2, −2, 1, 3, −1, −2] selected by Algorithm 3 (indicated by red) that are displayed within the GHWT c2f dictionary (a) and the GHWT f2c dictionary (b).Note the orthogonality of these vectors, and compare them with those shown in Fig. 8. similar argument applies to 2  1,0 and 1 1,0 in the eGHWT best basis as shown in Fig. 9b: they cannot be selected in the GHWT f2c best basis simultaneously.

Generalization to 2D Signals/Matrix Data
The Thiele-Villemoes algorithm [49] has been extended to 2D signals by Lindberg and Villemoes [23].Similarly to the former, the Lindberg-Villemoes algorithm works for only 2D signals of dyadic sides.We want to generalize the Lindberg-Villemoes algorithm for a more general 2D signal or matrix data whose sides are not necessarily dyadic, as we did for the previous GHWT dictionaries in [17].Before describing our 2D generalization of the eGHWT algorithm, let us brie y review the Lindberg-Villemoes algorithm.As we described in Sect.4.1, the best tiling in each step in the Thiele-Villemoes algorithm chooses a bipartition with a smaller cost between that of the time domain and that of the frequency domain.For 2D signals, the time-frequency domain has four axes instead of two.It has time and frequency axes on each of the ( , ) components (i.e., columns and rows of an input image).The best tiling comes from the split in the time or frequency directions in either the or component.This forces them to choose the best tiling/split among four options instead of two for each split.Similarly to the 1D signal case, dynamic programming is used to nd the minimizing tiling for a given 2D signal.We note that our 2D generalized version of the eGHWT algorithm exactly reduces to the Lindberg-Villemoes algorithm if the input graph is a 2D regular lattice of dyadic sizes and the midpoint (or midline to be more precise) partition is used throughout.
For a more general 2D signal or matrix data, we can compose the a nity matrices on the row and column directions separately (see, e.g., [17]), thus de ne graphs on which the rows and columns are supported.In this way, the input 2D signal can be viewed as a tensor product of two graphs.Then the eGHWT can be extended to 2D signal from 1D in a similar way as how Lindberg and Villemoes [23] extended the Thiele-Villemoes algorithm [49].Examples will be given in Sect.5.2.

Applications
In this section, we will demonstrate the usefulness and e ciency of the eGHWT using several real datasets, and compare its performance with that of the classical Haar transform, graph Haar basis, graph Walsh basis, and the GHWT c2f/f2c best bases.We note that our performance comparison is to emphasize the difference between the eGHWT and its close relatives.Hence we will not compare the performance of the eGHWT with those graph wavelets and wavelet packets of di erent nature; see, e.g., [4,18] for further information.

E cient Approximation of a Graph Signal
Here we analyze the eight peak-hour vehicle volume counts on the Toronto street network, which is shown in Fig. 10.We have already described this street network in Sect. 2 and in Fig. 2. The data was typically collected at the street intersections equipped with tra c lights between the hours of 7:30 am and 6:00 pm, over the period of 03/22/2004-02/28/2018.As one can see, the vehicular volume are spread in various parts of this street network with the concentrated northeastern region.
In addition to the eGHWT best basis, the graph Haar basis, the graph Walsh basis, the GHWT c2f/f2c best bases are used to compare the performance.Figure 10b shows the performance comparison.The -axis denotes the relative approximation error ‖ − ‖ 2 ∕‖ ‖ 2 , where denotes the approximation of with the basis vectors having the largest coe cients in magnitude.The -axis denotes ∕ , i.e., the fraction of coe cients retained.We can see that the error of the eGHWT decays fastest, followed by the GHWT f2c best basis, the graph Haar basis, and the GHWT c2f best basis that chose the graph Walsh basis, in that order.In Fig. 11, we display the nine most signi cant basis vectors (except for the DC vector) for the graph Haar, the GHWT c2f best basis (which selected the graph Walsh basis), the GHWT f2c best basis, and the eGHWT best basis to approximate this tra c volume data.These gures clearly demonstrate the superiority of the eGHWT best basis over the graph Haar basis, the graph Walsh basis, and the regular GHWT best bases.The top 9 basis vectors for these bases displayed in Fig. 11 show the characteristics of each basis under consideration.The graph Haar basis vectors are non-oscillatory blocky vectors with positive and negative components at various locations and scales.The graph Walsh basis vectors (= the GHWT c2f best-basis vectors) are all global oscillatory piecewise-constant vectors.The GHWT f2c bestbasis vectors and the eGHWT best-basis vectors are similar and can capture the multiscale and multi-frequency features of the input graph signal.Yet, the performance of the eGHWT best basis exceeds that of the GHWT f2c best basis simply because the former can search the best basis among a much larger collection of ONBs than the latter can.

Viewing a General Matrix Signal as a Tensor Product of Graphs
To analyze and process a single graph signal, we can use the eGHWT to produce a suitable ONB.Then for a collection of signals in a matrix form (including regular digital images), we can also compose the a nity matrix of the rows and that of the columns separately, thus de ne graphs on which the rows and columns are supported as was done previously [5,17].Those a nity matrices can be either computed from the similarity of rows or columns directly or can be composed from information outside the original matrix signal.For example, Kalofolias et al. [21] used row and column graphs to analyze recommender systems.After the a nity graphs on rows and columns are obtained, we can use the eGHWT to produce ONBs on rows and columns separately.Then the matrix signal can be analyzed or compressed by the tensor product of those two ONBs.In addition, as mentioned in Sect.4.5, we have also extended the eGHWT to the tensor product of row and column a nity graphs and search for best 2D ONB on the matrix signal directly.Note that we can also specify or compute the binary partition trees in a non-adaptive manner (e.g., recursively splitting at the middle of each region), typically for signals supported on a regular lattice.

Approximation of the Barbara Image
In this section, we compare the performance of various bases in approximating the famous Barbara image shown in Fig. 12a, and demonstrate our eGHWT can be applied to a conventional digital image in a straightforward manner, and outperforms the regular GHWT best bases.This image consists of 512 × 512 pixels, Figure 12b displays the relative 2 errors of the approximations by the graph Haar basis, the graph Walsh basis, the GHWT cf2/f2c best bases, and the eGHWT best basis as a function of the fraction of the coe cients retained.
In order to examine the quality of approximations visually, Fig. 13 displays those approximations using only 1∕32 = 3.125% of the most signi cant expansion coe cients for each basis.The eGHWT best basis outperforms all the others with least blocky artifacts.To examine the visual quality of these approximations further, Fig. 14 shows the zoomed-up face and left leg of those approximations.Especially for the leg region that has some speci c texture, i.e., stripe patterns, the eGHWT outperformed the rest of the methods.The performance is measured by PSNR (peak signal-to-noise ratio).
We note that both this approximation experiment on the Barbara image and that on the vehicular volume count data on the Toronto street network discussed in Sect.5.1, the order of the performance was the same: eGHWT > GHWT f2c > graph Haar > GHWT c2f ≥ graph Walsh.This order in fact makes sense for those input datasets.The eGHWT could simply search much larger ONBs than the GHWT best bases; the graph Haar basis is a part of the GHWT f2c dictionary; and the graph Walsh basis is a part of both the GHWT c2f and f2c dictionaries.The graph Walsh basis for these graph signals was too global while the GHWT c2f best basis tended to select nonlocal features compared to the eGHWT, GHWT f2c, and the graph Haar.

The Haar Transform for Images with Non-Dyadic Size
For images of non-dyadic size, there is no straightforward way to obtain the partition trees in a non-adaptive manner unlike the dyadic Barbara image of 512 × 512 in the previous subsection.This is a common problem faced by the classical Haar and wavelet transforms as well: they were designed to work for images of dyadic size.Non-dyadic images are often modi ed by zero padding, even extension at the boundary pixels, or other methods the Haar transform is applied.We propose to apply the Haar transform on a non-dyadic image without modifying the input image using the eGHWT dictionary.To obtain the binary partition trees, we need to cut an input image ∈ ℝ × horizontally or vertically into two parts recursively.Apart from using the a nity matrices as we did for the vehicular volume count data on the Toronto street network in Section 5.1 and for the term-document matrix analysis in [17], we propose to use the penalized total variation (PTV) cost to partition a non-dyadic input image.Denote the two sub-parts of as 1 and 2 .We search for the optimal cut such that ), and indicates the number of pixels in , = 1, 2. The denominator is used to make sure that the size of 1 and that of 2 are close so that the tree becomes nearly balanced.Recursively applying the horizontal cut on the rows of and the vertical cut on the columns of will give us two binary partition trees.We can then select the 2D Haar basis from the eGHWT dictionary or search for the best basis with minimal cost (note that this cost function for the best-basis search is the 1 -norm of the expansion coe cients, and is di erent from the PTV cost above).
To demonstrate this strategy, we chose an image patch of size 100×100 around the face part from the original 512 × 512 Barbara image so that it is non-dyadic.To determine the value of , we need to balance between the total variation and structure of the partition tree.Larger means less total variation value after split but a more balanced partition tree may be obtained.The value of can be ne-tuned based on the evaluation of the nal task, for example, the area under the curve of the relative approximation error in the compression task [18].In the numerical experiments below, after conducting preliminary partitioning experiments using the PTV cost, we decided to choose = 3.
For comparison, we also used the dwt function supplied by the Wavelets.jlpackage [48] for the classical Haar transform.We examined three di erent scenarios here: 1) directly input the Barbara face image of size 100×100; 2) zero padding to four sides of the original image to make it 128 × 128; and 3) even re ection at the borders to make it 128 x 128.
Figure 15 shows that the relative 2 -error curves of these ve methods.Note that we plotted these curves as a function of the number of coe cients retained instead of the fraction of coe cients retained that were used in Figs.10b and  12b.This is because the zero-padded version and the even-re ection version have 128 pixels although the degree of freedom is the same 100 × 100 throughout the experiments.
Clearly, the graph-based methods, i.e., the graph Haar basis and the eGHWT best basis outperformed the classical Haar transform applied to the three prepared The implementation of the dwt assumes the periodic boundary condition by default, and if it does not generate continuous periodic image, it would generate arti cially-large expansion coe cients due to the discontinuous periodic boundary condition; see [38,42,53] for further information.We can summarize that our graph-based transforms can handle images of non-dyadic size without any articial preparations (zero padding and even re ection) with some additional computational expense (the minimization of the PTV cost for recursive partitioning).

Another Way to Construct a Graph from an Image for E cient Approximation
We can view a digital image of size × as a signal on a graph consisting of nodes by viewing each pixel as a node.Note that the underlying graph is not a regular 2D lattice of size × .Rather it is a graph re ecting the relationship or a nity between pixels.In other words, , the weight of the edge between th and th pixels in that graph should re ect the a nity between local region around these two pixels, and this weight may not be 0 even if th and th pixels are remotely located.In the classical setting, this idea has been used in image denoising (the so-called bilateral ltering) [50] and image segmentation [44].On the other hand, Szlam et al. have proposed a more general approach for associating graphs and di usion processes to datasets and functions on such datasets, which includes the bilateral ltering of [50] as a special case, and have applied to image denosing and transductive learning.See the review article of Milanfar [27] for further connections between these classical and modern techniques and much more.
Here we de ne the edge weight as Szlam et al. [46] did: where [ ] ∈ ℝ 2 is the spatial location (i.e., coordinate) of node (pixel) , and [ ] is a feature vector based on intensity, color, or texture information of the local region centered at that node.As one can see in Eq. ( 3), the pixels located within a disk with center [ ] and radius are considered to be the neighbors of the th pixel.The scale parameters, and must be chosen appropriately.Once we construct this graph, we can apply the eGHWT in a straightforward manner.
We examine two images here.The rst one (Fig. 16a) is the subsampled version of the standard 'cameraman' image; we subsampled the original cameraman image of size 512 × 512 to 128 × 128 in order to reduce computational cost.For the location parameters, we used = 5 and = ∞.Note that = ∞ means that ⋅ becomes an indicator/characteristic function of a disk of radius with center .This setup certainly simpli es our experiments, and could sparsify the weight matrix if is not too large.On the other hand, the feature vector [ ] of the th pixel location, we found that simply choosing raw pixel value as [ ] can get good enough results for relatively simple images (e.g., piecewise smooth without too much high frequency textures) like the cameraman image.However, the 'Gaussian bandwidth' parameter for the features, , needs to be tuned more carefully.A possible tuning trick is to start from the median of all possible values of the numerator in the exponential term in Eq. (3), i.e., −‖ [ ] − [ ]‖ 2  2 [46], examine several other values around that median, and then choose the value yielding the best result.For more sophisticated approach to tune the Gaussian bandwidth, see [24].In this example, we used = 0.007 and = 0.07 after several trials starting from the simple median approach in order to demonstrate the e ect of this parameter for the eGHWT best basis.17 shows the eGHWT best-basis vectors corresponding to the largest nine expansion coe cients in magnitude.We can see that the human part and the camera part are captured by individual basis vectors.Furthermore, in terms of capturing the objects and generating meaningful segmentation, we observe that the results with = 0.007 shown in Fig. 17a are better than those with = 0.07 shown in Fig. 17b.This observation coincides with the better decay rate of the former than that of latter in Fig. 16b.Therefore, we can conclude that good segmentation (or equivalently, a good a nity matrix setup) is quite important for constructing a basis of good quality and consequently for approximating an input image e ciently.
Our second example is a composite texture image shown in Fig. 18a, which was generated and analyzed by Ojala et al. [31].Our method, i.e., applying the eGHWT on an image viewed as a graph, allows us to generate basis vectors with irregular support that is adapted to the structure of the input image as shown in the cameraman example above.This strategy also works well on this composite texture image if we choose the appropriate features in Eq. (3) to generate a good weight matrix.The raw pixel values as the features, which we used for the cameraman example, do not work in this composite texture image, which consists of highly oscillatory anisotropic features.Ideally, we want to use a piecewise constant image delineating each of the ve textured regions for the weight matrix computation.To generate such a piecewise constant image (or rather its close approximation), we did the following: 1. Apply a group of (say, ) 2D Gabor lters of various frequencies and orientations on the original image of size × ; 2. Compute the absolute values of each Gabor ltered image to construct a nonnegative vector of length at each pixel location; 3. Standardize each of these components so that each component has mean 0 and variance 1; 4. Apply Principal Component Analysis (PCA) [20] on these standardized vectors in ℝ ; 5. Extract the rst principal component (organized as a single image of the same size as the input image), normalize it to be in the range of [0, 1], and use it as features in Eq. ( 3).
In Step 1, the Gabor lters are used because they are quite e cient to capture high frequency anisotropic features [10].In this particular image, we used four spatial frequencies: 0.2, 0.3, 0.4, and 0.5 (unit: 1/pixel), and two di erent orientations: ∕3 and 5 ∕6, for the Gabor lters.This generated a set of = 8 nonnegative matrices in Step 2. We also note that the Gaussian bandwidth or the standard deviation of the Gaussian envelop used in the Gabor lters needs to be tuned appropriately.In our experiments, we used the wavelength-dependent bandwidth, i.e., = 3∕ ⋅ √ ln 2∕2 ⋅ , where ∈ {0.2, 0.3, 0.4, 0.5}, as [33] suggests.The normalized rst principal component in Step 5 as an image is shown in Fig. 18b.The pixel values of this 'mask' image are used as (scalar) features [ ] in Eq. (3).Note that this mask computation was done on the original 512×512 image, which was subsequently subsampled to have 128 pixels to match the subsampled original image.As for the Gaussian bandwidth, , in Eq. (3), we set = 0.0005 after several trials.The other parameters, ( , ), were set as (3, ∞). Figure 19a compares the performance of ve di erent methods in approximating the composite texture image shown in Fig. 18a.The order of approximation performance is the same as the previous examples, i.e., the vehicular volume counts on the Toronto street network in Fig. 10b and the Barbara image using the prede ned dyadic partitions in Fig. 12b: eGHWT > GHWT f2c > graph Haar > GHWT c2f = graph Walsh.Figure 19b displays the top nine eGHWT best-basis vectors.We can see that the support of these basis vectors approximately coincide with the ve sections of the composite texture image, and some of the basis vectors, particularly the fourth and eighth ones, exhibit the oscillatory anisotropic features.In this article, we have introduced the extended Generalized Haar-Walsh Transform (eGHWT).After brie y reviewing the previous Generalized Haar-Walsh Transform (GHWT), we have described how the GHWT can be improved with the new best-basis algorithm, which is the generalization of the Thiele-Villemoes algorithm [49] for the graph setting.We refer to this whole procedure of developing the extended Haar-Walsh wavelet packet dictionary on a graph and selecting the best basis from it as the eGHWT.Moreover, we have developed the 2D eGHWT for matrix signals by viewing them as tensor products of two graphs, which is a generalization of the Lindberg-Villemoes algorithm [23] for the graph setting.
We then showcased some applications of the eGHWT.When analyzing graph signals, we demonstrated the improvement over the GHWT on synthetic and real data.For the simple synthetic 6-node signal, we showed that the best ba-sis from the eGHWT can be selected by neither the GHWT c2f dictionary nor the GHWT f2c dictionary and it had the smaller cost than the GHWT c2f/f2c best bases could provide.On the vehicular volume data on the Toronto street network, the eGHWT had the best approximation performance among the methods we considered.Then we proceeded to the applications to image approximation.After demonstrated the superiority of the eGHWT combined with the predetermined recursive partitioning on the original Barbara image of dyadic size, we proposed the use of the data adaptive recursive partitioning using the penalized total variation for images of non-dyadic size, and again showed the superiority of the eGHWT and the graph Haar transform over the classical Haar transform applied to the preprocessed versions of a non-dyadic input image.Finally, we demonstrated that the eGHWT could be applied to a graph generated from an input image by carefully choosing the edge weights that encode the similarity between pixels and their local neighbors, to get not only superior approximation performance but also meaningful and interpretable basis vectors that have nonrectangular supports and can extract certain features and attributes of an input image.
The eGHWT basis dictionary is constructed upon the binary partition tree (or a tensor product of binary partition trees in the case of 2D signals).Currently, we use the Fiedler vectors of random-walk-normalized Laplacian matrices to form the binary partition tree.However, as we have mentioned earlier, our method is so exible that any graph cut method or general clustering method can be used, as long as the binary partition tree is formed.
If one is interested in compressing a given graph signal instead of simply approximating it, where the encoding bitrate becomes an important issue, the eGHWT should be still useful and e cient.There is no need to transmit the eGHWT bestbasis vectors.If both the sender and the receiver have the eGHWT algorithm, then the only information the sender needs to transmit is: 1) the input graph structure via its adjacency matrix (which is often quite sparse and e ciently compressible); 2) the graph partitioning information to reproduce the hierarchical bipartition tree of the input graph; 3) the indices of the eGHWT best basis within this tree; and 4) the retained coe cients (after appropriate quantization) and their indices within the eGHWT best basis.
Another major contribution of our work is the software package we have developed.Based on the MTSG toolbox written in MATLAB® by Je Irion [16], we have developed the MultiscaleGraphSignalTransforms.jl package [13] written entirely in the Julia programming language [1], which includes the new eGHWT implementation for 1D and 2D signals as well as the natural graph wavelet packet dictionaries that our group has recently developed [4].We hope that interested readers will download the software themselves, and conduct their own experiments with it: https://github.com/UCD4IDS/MultiscaleGraphSignalTransforms.jl.
The readers might feel that the graphs we have used in this article are rather restrictive: we have only used 2D irregular grids (i.e., the Toronto street map) and 2D lattices (i.e., the standard digital images).However, we want to note that applying the eGHWT to more general graphs (e.g., social network graphs, etc.) is quite straightforward, and no algorithmic modi cation is necessary as long as an input graph is simple, connected, and undirected.As we have discussed earlier, the performance of the eGHWT for even such general graphs is always superior to that of the GHWT, the graph Haar basis, and the graph Walsh basis.We strongly encourage interested readers to try out our software package described above for such general graph signals.
There remains a lot of related projects to be done.The most urgent one is to implement the version of the eGHWT that can handle multiple graph signals (on a given xed graph).In the classical setting, Wickerhauser proposed the so-called Joint Best Basis (JBB) algorithm [52,Sect. 11.2] while Saito proposed the so-called Least Statistically-Dependent Basis (LSDB) algorithm [37], which correspond to the fast and approximate version of the PCA and the ICA, respectively.Moreover, our group has also developed the Local Discriminant Basis (LDB) algorithm that can extract distinguishing local features for signal classi cation problems [39,41].We have already implemented the HGLET and the GHWT that can handle multiple graph signals and that can compute the JBB, the LSDB, and the LDB.Hence, making that version of the eGHWT is relatively straightforward, and we plan to do so very soon.
Another important project, which we are currently pursuing, is to use an appropriate subset of the scaling vectors in the eGHWT dictionary for estimating good initialization to start the Nonnegative Matrix Factorization (NMF) algorithms that are of iterative nature; see, e.g., [29].
Finally, as a long-term research project, we plan to extend the GHWT and eGHWT beyond matrix-form data, i.e., to tensorial data (see, e.g., [34]), which seems quite promising and worth trying.
We plan to report our investigation and results on these projects at a later date.

Lemma 1 (
Thiele & Villemoes [49]) Let ⊂ be a rectangle of area greater or equal to two, with left half , right half , lower half , and upper half .Assume each tile ⊂ has the cost ( ).De ne Fig. 5: (a) The GHWT c2f dictionary on 6 .Stem plots with black, red, and blue colors correspond to the scaling, Haar, and Walsh vectors, respectively.(b) The relabeled GHWT c2f dictionary by Algorithm 2 applied to the GHWT c2f dictionary shown in (a).The gray stem plots indicate the " ctitious" (or "non-existent") basis vectors newly generated by Algorithm 2.

5
with its cost ≈ 8.28 and the f2c-GHWT best basis representation is
Fig. 10: (a) Vehicular volume data in the city of Toronto (the node diameter is proportional to its volume counts for visualization purpose); (b) Relative 2 approximation errors by various graph bases

Fig. 11 :
Fig.11: Nine most signi cant basis vectors of the graph Haar, the GHWT c2f best basis (= the graph Walsh in this case), the GHWT f2c best basis, and the eGHWT best basis.The diameters of the nodes are proportional to the log of the absolute values of the basis vector amplitudes.The basis vector amplitudes within (−0.001, 0.001) are mapped to the viridis colormap to emphasize the di erence between positive (yellow) and negative (deep violet) components.

Fig. 13 :Fig. 14 :
Fig.13: Approximations of the Barbara image using various bases using only 3.125% of coe cients (online viewing is recommended for the details)

Fig. 15 :
Fig. 15: Comparison of approximation performance on the face part (of size 100 × 100 pixels) of the Barbara image: the classical Haar transform with various setups, the graph Haar basis, and the eGHWT best basis

Fig. 18 :
Fig. 18: (a) A composite texture image of size 128 × 128 after subsampling the original image of size 512 × 512; (b) Mask image computed from PCA of 8 Gabor ltered images followed by subsampling to 128 × 128 pixels

Fig. 19 :
Fig. 19: (a) Relative 2 approximation error of Fig. 18a using ve methods; (b) Top 9 eGHWT best-basis vectors Ratio Cut and Normalized Cut are de ned by