Introducing the locally stationary dual-tree complex wavelet model

This paper reconciles Kingsbury's dual-tree complex wavelets with Nason and Eckley's locally stationary model. We here establish that the dual-tree wavelets admit an invertible de-biasing matrix and that this matrix can be used to invert the covariance relation. We also show that the added directional selectivity of the proposed model adds utility to the standard two-dimensional local stationary model. Non-stationarity detection on random fields is used as a motivating example. Experiments confirm that the dual-tree model can distinguish anisotropic non-stationarities significantly better than the current model.


Introduction
This paper has been completed by the second, third and fourth authors following the untimely death of the first author, James Nelson, in September 2016. It has been submitted for publication in his memory.
Building on the non-stationary time-series work of Dahlhaus (1997) and first proposed by Nason et al. (2000), the locally stationary wavelet (LSW) model provides an overcomplete, translation-invariant representation of a large class of non-stationary time series. Unlike many other such tools, LSW-based methodology affords a statistically wellprincipled means to capture the local covariance and local spectrum. That auto-covariance estimation of non-stationary processes by any other means comes entangled with various fundamental difficulties has enabled LSW models to gain traction across a variety of domains such as forecasting for finance (Fryzlewicz 2005); establishing dependencies in electrophysiological data for neuroscience applications (Sanderson et al. 2010); and spectral estimation of environmental time series and ECG traces with missing data (Knight et al. 2012).
Recent years have seen much basic theoretical development of locally stationary processes. For example, work has focussed on: variants of the smoothness constraints placed on the local spectrum function, such as the work by van Bellegem and von Sachs (2008), Fryzlewicz and Nason (2006), and Nason and Stevens (2015); confidence intervals for the empirical local covariance as recently derived by Nason (2013); and changepoint estimation, such as the work of Killick et al. (2013) and Cho and Fryzlewicz (2015).
Of particular interest here is the extension of the LSW theory to two-dimensional processes as recently driven by Eckley et al. (2010) and Eckley and Nason (2011). In particular, they extended the result that the local auto-covariance converges uniformly to auto-covariance as sample size increases and that there is an invertible linear mapping from one to the other. This work was subsequently followed by methodology applied to non-stationarity detection in image textures by Taylor et al. (2014) and segmentation of imagery into stationary regions by Nunes et al. (2014). A regularised smoothing strategy for the two-dimensional LSW process is developed in Gibberd and Nelson (2016). Very recently, Taylor et al. (2017) combined the lattice and multivariate extensions to formulate an LSW model for multivalued image data. As such, this can be exploited to compute multiscale estimates of the local inter-spectral covariance structure of multispectral imagery. The model was then applied to the task of classification of colour image textures.
The locally stationary wavelet model is hence becoming a flexible, extensible means to capture rich behaviour in spatial and image data and to solve practical problems. The overcompleteness of the LSW model, compared to, say, decimated wavelet schemes, is a key property which enables the LSW to elicit the auto-covariance. The increased number of wavelet coefficients permits a multiscale sample autocovariance of sorts to be computed about any point in space. This is problematic in the decimated case since the periodogram is computed over the dyadic grid, and as Nason et al. (2000) notes, the decimated LSW model fails to accommodate all stationary processes (von Sachs et al. 1998). This contrasts with the non-decimated LSW model, which can describe any stationary process with finite integrated autocovariance.
Overcompleteness not only affords a feasible means to estimate auto-covariance but also, as a side benefit, results in a translation-invariant periodogram. In effect, it provides the same information as a decimated transform at arbitrary integer shifts of the input data.
The main thrust of our work herein is to posit that further overcompleteness may provide extra value. In particular, when considering random fields such as image data defined over the lattice, directionality can be of significant importance. Often such data present an anisotropic covariance structure. A simple example of an anisotropic field is illustrated in Fig. 1. Here the piecewise non-stationarity is due to a simple relative rotation in the auto-covariance function, between the inner rectangle and outer annulus. In image processing, the ability to distinguish between such directionality of textures greatly aids tasks such as classification and segmentation.
Unfortunately, traditional real-valued (mono) wavelets such as the Daubechies family, favoured by the LSW theory, possess very limited directionality. This constrains their AE(90) AE (70) AE(50) AE(30) Fig. 1 Anisotropically non-stationary textures drawn from AE(θ) process, as defined in Sect. 4.2.2. Processes comprise an inner and outer region with different orientations, cf. Nelson and Gibberd (2016) ability to model and represent highly directional phenomena typical in most modern image processing problems. As such, this has given rise to many so-termed 'directional wavelets', including the curvelets of Starck et al. (2002), the contourlets of Do and Vetterli (2002), the steerable pyramids of Portilla and Simoncelli (2000) and the monogenic Riesz-Laplace wavelets of Unser et al. (2009). We here investigate the incorporation of a special directional wavelet basis, namely the dual-tree complex wavelets (Kingsbury 2001;Selesnick et al. 2005), into the locally stationary wavelet framework. It is shown that, since these complex wavelets are constructed as natural extensions of real-valued wavelets, much of the existing LSW theory can be developed by considering a complex-valued variant of the LSW framework. Indeed, in the sense that the dual-tree complex wavelet framework is a multiwavelet tight frame of order two, it can be considered 'closer' to the usual real-valued wavelets than some of the other more exotic and redundant directional wavelets, and as such, it provides a natural means to extend enhanced directionality into the LSW framework.
In Sect. 2, we propose the dual-tree complex locally stationary wavelet processes, denoted hereafter as LSW(C). The processes are real-valued but the definition quite naturally permits complex-valued wavelets and transfer functions-hence extending the usual LSW model. An additional property, taken straight from dual-tree complex wavelet theory, is placed on the relationship between the real and imaginary parts of the wavelet filters. This stipulates that they form a Hilbert pair and furnishes the wavelets with directional selectivity and polarity. Theorem 1 transfers a central result from the LSW case that the local auto-covariance converges to auto-covariance for LSW(C) processes.
Some estimation theory is covered in Sect. 3. We provide results that show the LSW(C) periodogram is a biased estimator of the spectrum. Furthermore, we establish the result that, under reasonable conditions, the biasing matrix can be inverted; equivalently, the corresponding auto-correlation wavelets retain linear independence; that the spectrum is uniquely defined, given the LSW(C) process; and that the mapping between spectrum and covariance is invertible.
In Sect. 4, simulations and experiments on a recent stationarity detection method confirm the utility of the enhanced directionality afforded by the proposed model.
Thus, for the first time, our work reconciles two significant contributions to the field of wavelet analysis, combining the locally stationary wavelet model with the family of dual-tree complex wavelets.

The complex locally stationary wavelet model
The dual-tree complex wavelet transform (Kingsbury 2001;Selesnick et al. 2005) employs a family of carefully constructed wavelet functions with the remarkable property that they jointly maintain approximate translation invariance and admit a full multiresolution analysis in the sense that the resulting two-scale relations are satisfied with finite sequence filters. This enables an implementation of a nearshift-invariant discrete wavelet transform at around the cost of two decimated, fast discrete wavelet transforms which can be computed in parallel.
The two decomposition trees are interpreted as the real and imaginary parts of the complex wavelet coefficients. The near-translation-invariance property is achieved by designing the wavelet filter sequences in such a way that they form an approximate Hilbert transform pair-where the real and imaginary parts are 90 • out of phase with each other. That such invariance is achieved in concert with a genuine multiresolution analysis places dual-tree wavelets in a very special and unique category in what many refer to informally as the 'wavelet zoo'.
In 2-D, the dual-tree scheme also naturally gives rise to improved directionality relative to regular, real-valued wavelets [such as, for example, Daubechies wavelets used by Eckley et al. (2010)]. Figure 2 contrasts the directionality afforded by a regular, real-valued, discrete 2-D wavelet (or impulse response) function and that of a dual-tree com- plex wavelet function. The real-valued wavelet function comprises filters oriented at 0 • and 90 • to the horizontal in addition to a filter oriented at both ± 45 • . This third 'diagonal' filter will merge together any content oriented at ± 45 • -a practitioner will therefore be unable to distinguish between these two directions by studying the magnitude of the wavelet coefficients. For this reason, at best, one could perhaps argue that this wavelet function has a directionality of 2.5. On the other hand, Fig. 2 also illustrates that the dual-tree complex wavelets offer six fully directional filters, oriented at {(30 − 15) • } 6 =1 . Furthermore, the complex nature of the filters also yields polarity, or phase, information about image edges and singularities.
The resulting dual-tree complex wavelet impulse responses resemble the multiscale directional band-pass filters that the V1 human/mammalian cortical filters use as a front-end process to their visual systems far more faithfully than the directionally limited conventional wavelets-in their experimentally led work Hubel and Wiesel (1962) first remarked on the strong responses to directionality in mammalian vision and, more recently, Ng et al. (2006) provide a review which includes a discussion of the analogous role of wavelets with respect to the underlying biology.
In practice, the directionality offers far richer information than the classical, real-valued wavelet functions. As such, dual-tree complex wavelets and their various extensions continue to find hundreds of applications in image modelling and processing areas such as variational Bayesian enhancement (Zhang and Kingsbury 2015), registration and fusion for atmosphere correction (Anantrasirichai et al. 2013), segmentation via semi-local scaling exponent estimation , and many more.
Eckley's 2-D local stationary wavelet (LSW) model describes a stochastic process or field over the finite lattice T := [[1, T ]] 2 as a dynamically weighted filtering. This combines a real-valued weighting sequence w j of a centred orthonormal increment sequence ξ j , and an overcomplete basis comprising what Nason et al. (2000) and Eckley et al. (2010) term (undecimated) 'discrete wavelet functions' ψ j (·) ∈ 2 (Z 2 ), indexed over both scale j and orientation . As usual, the '2-D' wavelet functions are separable and are constructed from '1-D' mother and father wavelets where the discrete mother wavelet functions are defined by the recursions for j > 0 and s ∈ [[1, T ]], with ψ 0 (s) = h 1 (s), where the h 0 and h 1 are, respectively, the low-and high-pass filter sequences of the wavelet ψ. Similar recursions hold for the father wavelets, cf. Daubechies (1992) and Eckley et al. (2010). As noted by Nason et al. (2000), these discrete wavelet functions are not (necessarily) the result of discretising a continuous wavelet function. In fact, the recursions described by (2) are equivalent to setting all wavelet coefficients, except one at the jth finest scale level, to zero and then performing an inverse decimated wavelet transform with upsampling. Further discussion on the construction of these sequences, and their numerical generation, can be found in (Daubechies 1992, p. 204). The LSW can be loosely described as a multiscale, dynamic, moving average model. For only one scale j 0 and direction 0 , say, the LSW becomes a moving average model with weights that can evolve over space via the sequence w = {w(k)} k . If, in addition, the weights are constant over space w = w 0 , say, then the LSW collapses down to a simple moving average model with parameters {w 0 ψ 0 j 0 (k)} k . For the sake of controlling notational clutter, we fuse the scale and orientation indexes into one single index η = η( j, ) ∈ H ≡ {1, . . . , 6J }. Without loss of generality, put We further note that the LSW can be equivalently stated in terms of convolutions X = η∈H w η ξ η * ψ η , where the · denotes a left-right and up-down spatial flip: ϕ := ϕ(−·). Aside from providing an alternative convenient means of conceptualising the model, introducing such operator notation also helps mitigate notational clutter in some of the later proofs.
Definition 1 places further constraints on how quickly the weighting sequence can vary over the spatial support. We here transfer all of the basic real-valued LSW modelling assumptions in Nason et al. (2000) and Eckley et al. (2010) over to our (dual-tree) complex wavelet case. The differences are that (i) the transfer sequences w and transfer functions W , below, are now complex-valued, and (ii) Property 5 is added in order to exploit the approximate Hilbert transform pair property of the dual-tree complex wavelets and therefore produce wavelets which are directionally selective and which carry phase information.
Unlike the LSW model, the DTCWT, as originally designed, is decimated. Instead, we here incorporate a nondecimated version of the DTCWT into the LSW. Although the resulting extra computation has little affect on translation invariance (the DTCWT is already near-shift invariant), the dual-tree complex wavelets do greatly enhance the poor directional description suffered by the LSW model and the non-decimated construction provides a simple mechanism to introduce them into the LSW framework. Undecimated versions of the DTCWT have recently been proposed by Hill et al. (2015) who, in addition, note the added virtues that the resulting wavelet coefficients are truly co-located over the scale levels rather than lying on the usual dyadic grid. Although we do note that a big attraction of the DTCWT is the ability to perform a decimated transform, the incorporation of decimation, partially or wholly, into the LSW is left as further work for now.

Definition 1
The locally stationary dual-tree complex wavelet stationary processes are defined as the class LSW(C) of all stochastic processes X : T → R over the finite lattice T := [[1, T ]] 2 which satisfy the following representation in the mean-squared sense: with complex-valued sequences w η ∈ 2 (T ), centred orthonormal increment processes ξ η : T → R, discrete complex wavelet functions ψ η : T → C, and where there exist functions W η : T 2 := (0, 1) 2 → C, and constants C η , λ η ∈ R such that all the following properties hold for all η ∈ H.
5. The complex wavelet functions satisfy the approximate Hilbert pair property, cf. Kingsbury (2003) and Selesnick et al. (2005), in that its real ψ ·;0 and imaginary ψ ·;1 parts satisfy: ψ = ψ ·;0 + iψ ·;1 , with ψ ·;0 , ψ ·;1 : T → R, and where the associated filter sequences h a b satisfy the below properties, where b = 0 or b = 1 signifies the low-pass or high-pass band sequences, respectively, and where a = 0 denotes the real part of the tree and a = 1 the imaginary part of the tree: and where n is the length of the filter sequence.
Note that the approximate Hilbert pair property between ψ .;0 and ψ .;1 ensures that the negative half of the spectrum of ψ becomes approximately zero. This is the key to obtaining directionally selective filters in 2-D from separable 1-D filters, as explained in Selesnick et al. (2005). The collection of functions {W η } comprise the parameters of the LSW model. They determine how much autocorrelation is present at each scale level. As T → ∞ and the number of sample points increases, then properties 1 and 2 above associate the collection of transfer sequences {w η } with, what become, the bounded family of transfer functions {W η }. Properties 3 and 4 stipulate that the transfer function should be smooth. Asymptotically, the weights are associated with Lipschitz continuous functions W η : T 2 → C, defined over rescaled space z = t/T , t ∈ T . This condition restricts the variation of the weights over space and thus captures the fact that the local structure becomes more stationary when observed over evermore smaller spatial neighbourhoods.
A central premise, and unique selling point, of the LSW model is that it very naturally offers access to a measure of local covariance via the local spectrum and the autocorrelation wavelet.
Definition 2 (Complex ACW ) Let the superscript · denote complex conjugation. Define the (complex) auto-correlation wavelet ACW associated with the possibly complex-valued, discrete wavelet ψ : N 2 → C by Definition 3 (Local wavelet spectrum) Let X ∈ LSW(C) have transfer functions W η : T 2 → C. Then, the local wavelet spectrum (LWS) of X is defined by Definition 4 (Local auto-correlation) Let S be the local wavelet spectrum, defined by Definition 3, of an LSW(C) process (Definition 1) and let Ψ be the auto-correlation wavelet as described by Definition 2. Then, the local covariance is defined by A remarkable property is that this local covariance forms a good estimate for the auto-covariance of the, possibly nonstationary, LSW process. The next result establishes this fact for the locally stationary dual-tree complex wavelet processes.
Theorem 1 (Convergence of auto-covariance) Let C and C T be, respectively, the local auto-covariance and autocovariance of an LSW process X ∈ LSW(C). Then as T → ∞, uniformly in t ∈ Z 2 , and z ∈ T 2 .
Remark 1 (Complex extension of the LSW) For X ∈ LSW(C): 1. The auto-correlation wavelet is defined via the natural complex extension of the auto-correlation operator. 2. The local wavelet spectrum quantity works in much the same way as it does in the real LSW case only now it is formed by computing the complex modulus of the transfer function.

Estimation theory
The (local wavelet) spectrum S represents: the parameters of the LSW(C); the amount of 'energy' present at a particular scale and orientation; and, via Theorem 1 the spectrum also establishes the second-order behaviour of the field. Two key questions naturally follow. The first is how the spectrum can be estimated and, second, is whether the spectrum is uniquely defined, given a field. The (undecimated) local wavelet periodogram (LWP) offers a straightforward but naive estimate of the spectrum. As Theorem 2 shows, the redundancy in the LWS model spreads content, via a so-termed biasing matrix across the periodogram scales.
Definition 5 (LS-DTCW biasing matrix) Let Ψ be the complex ACW as above. Then, the associated biasing matrix is Theorem 2 (Periodogram bias) Let the LSW(C) process

undecimated, dual-tree complex wavelet basis and let A be the associated biasing matrix. Then
The biasing matrix is the agent of this spread or mixing of redundancy. Note that Definition 5 is but the complex extension of the inner product used to define the real biasing matrix of the locally stationary, real-valued, wavelet model. In addition to facilitating some subsequent results, it is perhaps of intrinsic interest to note that the biasing matrix is real-valued.
Lemma 1 (Bias is real) Let Ψ be the ACW (Definition 2. The entries of the biasing matrix (Definition 5) are positive, real, and symmetric.
Remark 2 (Unbiased spectrum estimator) If the biasing matrix is invertible, then one can derive an unbiased estimate of the spectrum via: The consistency of the smoothed version of the real-valued wavelet periodogram is established and discussed by Nason et al. (2000) in the 1-D case and Eckley et al. (2010) in the 2-D case. A proof of consistency of the Nadaraya-Watson kernel estimator (cf. Nadaraya 2016; Watson 1964) is given by Taylor et al. (2017).
The existence of the debiasing matrix-the inverse of the biasing matrix-rests solely upon the choice of wavelet. Nason et al. (2000) proved existence for the 1-D Daubechies wavelets, and Eckley et al. (2010) extended this to the 2-D Daubechies wavelets. A similar result, for other wavelet families, has remained a conjecture and open problem ever since Nason et al. (2000) carried out their original work.
Instead, we can here report that, in practice, the debiasing matrices exist for 2-D locally stationary dual-tree wavelets with data sizes 2 n × 2 n for n ∈ [[1, 9]]. Furthermore, we present the following result that, given the existence of a debiasing matrix associated with a generic 1-D wavelet function, the corresponding dual-tree wavelet function will also have an associated debiasing matrix.
An equivalent result holds which reveals the connection between the invertibility of the biasing operator and the linear independence of the auto-correlation wavelet.
Remark 3 (Invertibility and ACW independence) Since the matrix A is a Gramian, the linear independence of Ψ is equivalent to the non-singularity of A.
Crucially, linear independence naturally gives rise to uniqueness.

Corollary 1 (LWS uniqueness)
The LWS is uniquely defined, given the corresponding LSW(C) process.
For completeness, and analogous to the real-valued wavelet case, there is also an inversion property between the spectrum and local auto-covariance function.

Experiments
The advantages of the additional directionality afforded by the dual-tree complex wavelets are manifested in various ways according to the task at hand. We here consider just a few numerical experiments, but note that, since the non-LSW form of the dual-tree complex wavelet transform has spawned hundreds of applications in signal and image processing, there are many other such potential applications of this LSW-based model. We focus here on a couple of examples that have been discussed specifically in the LSW literature. Firstly, we qualitatively demonstrate the broader model space provided by the LS-DTCW. Numerical evidence is then offered towards the benefits of greater directionality to the task of non-stationarity detection.

Simulating anisotropic stationary random fields
An attractive property of the LSW framework is that, given a known local wavelet spectrum S, one can readily draw simulated fields from the prescribed models. In the absence of knowledge about S, a good alternative is to perform estimation of the spectrum. Thus, if one can both estimate S and use this approximation to simulate a process, then it is possible to informally and qualitatively compare the two model spaces for LSW and LS-DTCW. In the proceeding experiment, a simple texture is simulated using one model (either LSW or LS-DTCW) and then the other model is used to estimate the spectrum of the simulated process. Using the estimate, an attempt to 'resimulate' the texture from this spectrum can be made.
As described in Sect. 3, the spectrum can be estimated in a pointwise fashion even when the field in question is nonstationary. Algorithm 1 provides the required simple recipe. First of all, the periodogram |X * ψ η | 2 of the random field X is computed. This is merely a double convolution over space. Since the wavelets considered here, either Haar in the LSW case or q-shift in the LS-DTCW case, are fairly short-2-taps for Haar, and around 12 or more taps for q-shift-this is not a significant computational task. For larger filters and, especially, those with infinite support, speed-ups can be obtained by performing the convolution in the Fourier domain. Either way, the periodogram is then bias-corrected by applying the inverse mapping: . For each pixel t, this is equivalent to applying a H × H matrix transform to a vectorised form of the periodogram, namely X ∼ 1 (t), . . . , X ∼ H (t) ∈ R H , with H := #H. In the interests of comparing the extra directionality provided by the LS-DTCW with that of the LSW, it is instructive to, first of all, consider a simple stationary case. In Fig. 3, the sub-figures labelled 'simulation', with indexes j and are fields simulated by setting S η (·) ≡ 1 for the particular stated value of η = j + ( − 1)J , where j is the jth coarsest scale level, and is the directional sub-band such that: = 1 refers to a horizontal stripe direction, = 2 a vertical stripe direc-tion, and = 3 refers to the 'diagonal' sub-band direction. All other values of S η (for any other ( j, )) are set to zero and the field is simply drawn from the model via X = S η ξ η * ψ η , with ξ η ∼ N (0, 1).
For example, the processes illustrated at the top of the first column of Fig. 3 are simulated using the simple piecewise construction Similarly, the processes depicted in the third column of the first row have a spectrum equal to δ ,3 δ j,4 across the field. Of course, strictly speaking, this contravenes the Lipschitz continuity of the spectrum, as per Definition 1, Property 3 but, nonetheless, it serves as a simple, intuitive example. The directional sub-band = 1, here refers to a horizontal stripe orientation. As such, horizontally oriented waves of texture should be more perceptible than vertical ones in any of the three 'LSW simulation' sub-figures with = 1. Similarly, the third column of sub-figures depicts LSW random fields oriented in the 'diagonal' direction. Likewise, here, waves of texture oriented at ±45 • should be evident.
In this simple simulation scenario, the stationarity of the resulting fields means that spectrum estimates can be improved by simply averaging the spectrum estimate over the entire spatial support. Algorithms 1 and 2 describe the recipes required to first estimate the spectrum and then, secondly, resimulate the texture. The second and fourth columns of Fig. 3 illustrate the LS-DTCW resimulations of the LSW textures in columns one and three, respectively.
It can be seen, for example, that the second column reflects the horizontal wave directions of the first column in the appropriate levels of scale fairly well. Likewise, the waves oriented at ± 45 • in the third, LSW simulated, column are also present in the fourth, LS-DTCW resimulated, column.
In contrast, Fig. 4 shows the reverse of this experiment where, firstly, textures are simulated from the LS-DTCW model and then, secondly, they are estimated and resimulated using the LSW model. The processes plotted in the first column have textures oriented at 15 • to the horizontal axis. However, owing to its poor directionality and directional selectivity, the LSW is unable to convincingly capture this behaviour. Likewise, the third column depicts LS-DTCW textures orientated at 45 • to the horizontal axis. Again, the LSW cannot resimulate textures that resemble anything similar to the original textures. In short, in this example, the LS-DTCW can capture behaviour seen in the LSW model but not vice versa. Since more complex textures can be constructed from linear combinations of these simple building block examples, it could therefore be argued that the LS-DTCW model is, broadly speaking, more general than the LSW model. This is perhaps not surprising since the LS-DTCW model comprises twice the number of wavelet atoms than the LSW model, but it is interesting to note that the difference manifests quite decidedly when considering the directionality of the texture.

Algorithm 1 Unbiased spectrum estimation, LWSE, cf. Taylor et al. (2014)
Algorithm 2 Simulation algorithm, LWSIM, cf. Taylor et al. Taylor et al. (2014) explored the role of the LSW to the problem of non-stationarity detection in images. This was applied to the problem of machine-vision detection of pilling effects in fabrics. Since the LSW provides a model which can be simulated from, Taylor et al. (2014) was able to derive a hypothesis testing procedure to determine stationarity. However, the pervasiveness of anisotropic non-stationary random fields in image processing motivates the extension of this test to our locally stationary dual-tree complex wavelet framework. Such an extension recognises the fact that nonstationarity can be caused, at least in part, by a rotation of the auto-covariance function, cf. Fig. 1.

Stationarity testing
More generally, detection and characterisation of such directional non-stationarity would be of interest to image processing tasks such as segmentation ; denoising or decluttering of highly directional textures such as sand ripples in sonar imagery (Nelson andKingsbury 2010, 2012); or change detection in remote sensing imagery (Hussain et al. 2013). Non-stationarity detection is also closely aligned with the saliency detection task (Raj et al. 2007). This open problem in image processing is a cru- cial and profound challenge towards higher-level automatic semantic understanding of image data. Taylor et al. (2014) extended ideas from time series to use the LSW model to perform stationarity hypothesis testing on random fields. Bootstrapping forms the central premise. The sample variance of the observed field is compared to that of series of simulated fields, where the simulations are drawn under the assumption that the observed field is stationary. Algorithms 1 and 2 can therefore be employed to perform the necessary spectrum estimation and simulation. The basic intuition is that, if the observation is stationary, then it will have a variance drawn from the same population as the simulations. If it is non-stationary, the observed variance will, on average, tend to be greater than the variances of the (stationary) simulations. Repeatedly drawing simulations from the hypothesised model then yields a p value. The advantage of this approach is that the distribution of the process need not be known analytically.
We follow the spirit of Taylor et al. (2014) and perform two sets of experiments. The size and power were assessed using a selection of stationary and non-stationary random fields, respectively. Unlike Taylor et al. (2014), and since one of our interests here is the added directionality afforded by the LS-DTCW model, we include some anisotropic models.

Algorithm
Unlike Nelson and Gibberd (2016) who concentrated on the simpler case where the null hypothesis is rejected based on the average spectrum sample variance statistic over all scales and orientations, we follow the multiple hypothesis testing framework. Here, instead, the null hypothesis is tested for all scale and orientation pairs individually and is rejected if any of the spectra sample variances (over η) deviates significantly from constancy: As in Taylor et al. (2014), we draw on the multiple hypothesis testing bootstrapping framework constructed by Davison and Hinkley (1997) to accommodate any distributional differences and dependencies in the spectrum sample variance statistics over the different scales and orientations. To this end, if π η is the tail probability of the statistic τ (η) for scale/orientation pair η, then the multiple hypothesis framework uses the test statistic q = min η π η . Algorithm 3 lays out the hypothesis testing procedure.

Data
Three stationary processes were used, namely: 1. A simple, white noise process X ∼ N (0, 1). 2. A spatial moving average, order one process where (t ik ) k∈∂(i) is the four-neighbourhood of the point t i .
Furthermore, three non-stationary fields were considered, viz.: 1. A piecewise constant stationary random field montage of two moving average, order one fields MA(1), denoted by MA (1; σ ), where one half has an innovation process with var = 1 and the other half has var = σ . 2. An anisotropic non-stationary field with exponential covariance with D = Diag(4, 1) and where the rotation matrix R θ (t) is piecewise constant with where T 0 is a central square region, cf. Fig. 1. The difference in rotation between the inner and outer regions, namely θ , was varied in the experiments to furnish eight non-stationary processes with Remark 4 Note that processes AE(θ ) are stationary for θ = 0 and non-stationary for any other value. The non-stationarity is due to the angular difference in the orientation of the textures in the inner and outer regions.

Stationarity results
We here compare the locally stationarity (real-valued) wavelet test with that of the locally stationary dual-tree complex wavelet test. For completeness, in addition to the multiple hypothesis tests, denoted by MH in Tables 1, 2, 3  Bold values highlight the best performance amongst methods under analysis   Bold values highlight the best performance amongst methods under analysis we also include results using the simpler, but less accurate, single-hypothesis test (denoted SH) as described by Taylor et al. (2014) and, later, Nelson and Gibberd (2016).
All multiple hypothesis experiments here below were conducted on 250 × 250 bootstraps (K = 250, M = 250 in Algorithm 3) and 100 instances of each image. The single hypothesis tests were conducted on 250 bootstraps.
Hence, for example, Table 1 records that, two out of 100 white noise processes caused a rejection of the null hypothesis at the 5% level when using the real-valued LSW multiple hypothesis test. The single-hypothesis tests were very conservative, triggering no null-hypotheses at all.
The dual-tree multiple hypothesis test managed to reach a figure closest to the ideal value of 5%. The tests on the MA(1) processes resulted in similar outcomes. Interestingly, unlike the real-valued LSW, the dual-tree single-hypothesis tests resulted in a less conservative figure for the anisotropic stationary process AE(0) experiments than it did for the isotropic processes.
The conservatism of the single-hypothesis test, relative to the multiple hypothesis test, broadly aligns with observations by Cardinali and Nason (2010) and Taylor et al. (2014).
The three non-stationary examples provide a means to assess the power of the stationarity tests. Table 2 provides the number of null-hypothesis rejections for 100 instances of the AE(θ ) process, per angle parameter θ . In a sense, this process becomes increasingly stationary as the angle θ is decreased away from 90 • towards 0 • .
It can be seen that, as expected, all tests improve as the angle increases. However, there is a very clear difference in the rate at which each test improves. The multiple hypothesis LS-DTCW is the most sensitive. The multiple hypothesis tests again obtain better (more sensitive) results than their respective single-hypothesis versions. Interestingly, however, the single-hypothesis version of the LS-DTCW obtains better results than the multiple hypothesis LSW. In this case, the extra directionality provided by the dual-tree wavelet model is able to elicit smaller changes in the anisotropic non-stationarity. Table 3 provides results on the isotropic and nonstationary process MA(1, σ ). One hundred realisations were used per value of variance σ ∈ {1.2, 1.3, 1.4, 1.5}. The results suggest that, even though the single hypothesis version of the LS-DTCW appears more sensitive to the single-hypothesis LSW, the multihypothesis LS-DTCW is only roughly comparable to the multihypothesis LSW test. Intuitively, since the 'information' about non-stationarity is spread over all directions there appears to be no advantage in splitting up the directional content of the processes any further than the three bands of the usual real-valued wavelet decomposition.
Finally, Table 4 shows the results from the Brodatz data. The percentage of null-hypothesis rejections are presented over the data set of 72 montage images. Again, the multiple hypothesis tests detect more non-stationarities than the single hypothesis tests. The LS-DTCW correctly declared all images to be non-stationary, irrespective of what angle the data were rotated. The single-hypothesis LS-DTCW performed a little worse and much more inconsistently with respect to angle than the multiple hypothesis LSW test. The single-hypothesis LSW test performed very poorly and only managed to pick up 6 or 7% of the non-stationarities. This experiment lends some credence to the view that, as long as some of the non-stationarity is distinguishable using directional information, then a wavelet decomposition with greater directionality may provide a more sensitive test.

Conclusions and further work
This work has focused on the case where the LSW spectrum is smoothly varying in the Lipschitz sense. However, for complicated image scenes with multiple objects, and distinct boundaries between those objects, a more realistic scenario would be to permit piecewise jumps in the underlying spectrum. This alternative formalism, whereby the transfer function W is assumed to have piecewise bounded variation, has been explored for univariate, time-series LSW models in several other works such as Fryzlewicz and Nason (2006) and Nason and Stevens (2015) via the Haar-Fisz device. Extending this concept to the spatial and complex LSW model would provide a natural target for further work.
The main thrust of this work was to incorporate a wavelet basis into the existing LSW model which was even more overcomplete than the existing undecimated basis. This idea can be pushed further. Other bases are possible such as the M-band extension of the dual-tree wavelets constructed by Chaux et al. (2006), wavelet packets schemes such as Bayram and Selesnick (2008), and mixtures of wavelets such as Nelson (2015). Although these constructions offer useful additional functionality, their statistically properties have yet to be explored. Placing such constructions in the LSW framework could provide an interesting means to extend the current theory and functionality.
Further work towards the non-stationarity detection application could also consider a more complete range of anisotropy testing machinery such as that proposed by Mondal and Percival (2012) and Thon et al. (2015). It could also consider incorporating the dual-tree complex framework into the very recent multibasis approach of Cardinali and Nason (2016) which discovers a broader range of non-stationary behaviours that is possible with only one basis on its own. Acknowledgements This paper was primarily written by Dr. James D. B. Nelson who sadly passed away September 2016. At various points throughout his life all the co-authors of this paper were fortunate enough to have worked with James. It is an honour and privilege to complete this work on behalf of, and in the memory of such a humble and intelligent member of our community.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecomm ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

A Proofs
Lemma 2 Let w, W , and C be the transfer sequence, function, and coefficients, respectively, from Definition 3. Then Proof The result is easily obtained from Definition 1, property (1), viz.
Lemma 3 Let S be the local wavelet spectrum of the LSW process X ∈ LSW(C). Then Proof The proof follows that of Eckley et al. (2010) without any necessary adjustments. We include it here for completeness. Using Definition 1, property (3), the triangular inequality, the square summability of the transfer function (Definition 1, property 2) and the boundedness of the Lipschitz constants (Definition 1, property 4), respectively, we have that Theorem 1 (Convergence of auto-covariance) Let C and C T be, respectively, the local auto-covariance and autocovariance of an LSW process {X (t)} t∈T ∈ LSW(C). Then as T → ∞, uniformly in t ∈ Z 2 , and z ∈ T 2 .
Proof Since X is zero mean, where [·] denotes the integer part of · . The orthonormality of the increments gives The remainder of the proof now follows the same arguments as Eckley et al. (2010), namely: Lemmas 2 and 3 now yield and the result is obtained by invoking the boundedness properties (1 and 4 from Definition 1) of the constants C and λ.
Proof The proof follows the spirit of Eckley et al. (2010), but, here, special attention is required to take care of the fact that both the wavelet functions ψ and the transfer sequences w are complex-valued, rather than real-valued. In addition, a slightly more 'operator-theory' flavour is applied to the final elements of the calculations whereby convolution properties are exploited to circumvent the need for some tedious changes of variables.
where the fourth equality results from the orthonormality of the increment processes ξ . From Lemma 2, we now use the fact that Now, since ν∈H C ν < ∞ and, since the (ψ ν * ψ η )(k) 2 is finite and bounded as a function of k (recall ψ has finite support), then From Lemma 3, we similarly use to get that Finally, the following convolution property completes the proof.
Lemma 1 (Bias is real) Let Ψ be the ACW (Definition 2. The entries of the biasing matrix (Definition 5) are positive, real, and symmetric.
Proof By construction, A is a Gram matrix. Hence, symmetry follows immediately. We then invoke Plancherel to write Theorem 3 (Invertibility of the biasing matrix) Let the biasing matrix A associated with an appropriately chosen real-valued wavelet ψ ·;0 defined over [[1, T ]] be non-singular. Then, the biasing matrix A (Definition 5) associated with the dual-tree wavelet (cf. Property 5 of Definition 1) ψ = ψ ·;0 + iψ ·;1 is non-singular with A = 2 A .
Noting that |z j | = 1 completes the proof.

Corollary 1 (LWS uniqueness)
The LWS is uniquely defined, given the corresponding LSW(C) process.
However, by Theorem 4, Ψ is linearly independent with respect to real scalars and the only way that (15) could hold, therefore, is if Δ η (z) = 0 which contradicts the original assumption.
Proposition 1 (LS-DTCW covariance inversion) Assume that the debiasing matrix A is non-singular, then Proof As in Nelson and Gibberd (2016) and Nason et al. (2000), we follow a proof by verification and then note that A is both symmetric and, by Lemma 1, real.: