Long memory estimation for complex-valued time series

Long memory has been observed for time series across a multitude of ﬁelds, and the accurate estimation of such dependence, for example via the Hurst exponent, is crucial for the modelling and prediction of many dynamic systems of interest. Many physical processes (such as wind data) are more naturally expressed as a complex-valued time series to represent magnitude and phase information (wind speed and direction). With data collection ubiquitously unreliable, irregular sampling or missingness is also commonplace and can cause bias in a range of analysis tasks, including Hurst estimation. This article proposes a new Hurst exponent estimation technique for complex-valued persistent data sampled with potential irregularity. Our approach is justiﬁed through establishing attractive theoretical properties of a new complex-valued wavelet lifting transform, also introduced in this paper. We demonstrate the accuracy of the proposed estimation method through simulations across a range of sampling scenarios and complex- and real-valued persistent processes. For wind data, our method highlights that inclusion of the intrinsic correlations between the real and imaginary data, inherent in our complex-valued approach, can produce different persistence estimates than when using real-valued analysis. Such analysis could then support alternative modelling or policy decisions compared with conclusions based on real-valued estimation.

constructions, just as for real-valued processes (Hurst 1951;Mandelbrot and Ness 1968), the degree of memory can still be quantified by means of a single parameter, the Hurst exponent parameter (Amblard et al. 2012;Sykulski and Percival 2016). Accurate estimation of the Hurst parameter offers valuable insight into a multitude of modelling and analysis tasks, such as model calibration and prediction (Beran et al. 2013;Rehman and Siddiqi 2009;Knight et al. 2017).
Complex-valued processes, both proper (circular) and improper (noncircular), are relevant across fields such as oceanography and geophysics Sykulski et al. 2017), where data are typically difficult to acquire and will frequently suffer from omissions/ missingness or be irregularly sampled (see, e.g. Fig. 1). In the next section, we describe datasets arising in environmental science that feature missing observations, which can be examined for long memory with a complex-valued representation. However, we note here that data from other scientific areas may benefit from analysis with our proposed methodology; see Sect. 6 for further discussion.

Persistence in wind series
Our motivating data example in this article arises from climatology. More specifically, wind series have been analysed extensively in the literature for modelling local weather patterns and spread of pollutants, as well as global climate dynamics. Long memory in wind series has been established by a number of authors; see, for example, Haslett and Raftery (1989), Chang et al. (2012) and Piacquadio and de la Barra (2014) and references therein. Specifically, Hurst exponent estimates for wind speed series on a range of sampling resolutions, including the 5 min scale considered here, have been shown to be in the range 0.7-0.9, indicating strong long-range dependence; see, for example, Fortuna et al. (2014). Accurate Hurst exponent estimation is used for accurate forecasting of wind speed, for example to assess future power yields (Haslett and Raftery 1989;Bakker and van den Hurk 2012).
Wind speed analysis in the literature is predominantly performed using real-valued data, such as (magnitude) wind speed series. However, more recently, a number of authors have advocated modelling wind measurements as complexvalued, developing analysis tools which exploit both speed and directional information of wind time series; see, for example, Goh et al. (2006) and Tanaka and Mandic (2007). These complex-valued modelling approaches have resulted in methodology for improved prediction for series such as those considered in this article Do well et al. 2014). To our knowledge, long memory estimation for stationary time series is exclusively performed using realvalued time series. In this article, we analyse the degree of persistence (long memory intensity) exhibited by complexvalued wind measurements, i.e. series which have both wind speed and direction, using new complex-valued Hurst estimation methodology we propose here.
The wind series we consider in this article consists of two datasets measured at a 5 min resolution from the Iowa Department of Transport's Automated Weather Observing System (AWOS). The (speed and angular) measurements for both datasets are available at http://mesonet.agron.iastate.edu/ AWOS/ . We firstly analyse data obtained from the Atlantic Municipal Airport (AIO) monitoring site over a period from 15 April 2017 until 30 April 2017. Whilst the sampling interval for the measurements is reported as 5 min, due to a number of reasons, for example faulty recording devices, the data in fact feature missingness which results in a mix of sampling intervals-our first dataset has intervals ranging from 5 to 15 min.
Since we have both speed and directional information for the dataset, we shall view the series using a complex-valued representation. The real and imaginary components of the series are shown in Fig. 1a, b, together with the locations of the missing data (depicted by triangles). The length of the first series is n = 3131 with an overall rate of missingness of 12%. Similar datasets from the Iowa monitoring system have been previously studied in the literature for the non-missing case but not in the context of Hurst estimation; see, for example, Tanaka and Mandic (2007) and Adali et al. (2011).
To explore the potential persistence in wind series, we examine the autocorrelation in the real and imaginary parts of the series, shown in Fig. 2a, b for the Wind A series. For these data, both components show highly significant autocorrelation over a range of lags, indicating long memory.
To further illustrate potential benefits of a more considered analysis approach for such data, we also investigate a dataset from the same monitoring site but for a different time periods, specifically, 30 April 2017 until 14 May 2017. For this dataset, the majority of the data are observed at a spacing of 5 min, but a significant amount have intra-measurement sampling between 10 and 20 min resulting from a missingness proportion of 20%; the series is of length n = 2942. We have specifically chosen to examine this second time period due to its high degree of missingness. The two components of the complex-valued series can be seen in Fig. 1c, d (triangles indicate missing series values).
Similar observations about potential long memory characteristics can be made for the second complex-valued wind series. In particular, both real and imaginary components of the series show considerable autocorrelation over a large range of lags (Fig. 2c, d).
In addition, plotting the series in the complex plane, we see that both datasets exhibit a rotational behaviour, due to the angular component of the series (Fig. 3). The series are not symmetric, exhibiting clear noncircularity, suggesting a model which allows for impropriety is appropriate for analy-  sis [for an in-depth discussion of these properties, the reader is directed to e.g. Sykulski and Percival (2016)]. This reflects similar observations on impropriety shown for other Iowa AWOS data in Adali et al. (2011), as well as other wind series ).

Aim and structure of the paper
A feature of many geophysical series, such as described in Sect. 1.1, is that there is a need to jointly analyse both components of a bivariate signal in order to reveal a common behaviour. Due to the natural representation in the complex plane, one mathematical solution is to combine the two pieces of information into a single, complex-valued series and analyse its properties . Adopting this approach thus calls for analysis techniques capable of dealing with complex-valued data. Additionally, for many applications the process sampling structure is inherently irregular, as the two components may be measured at irregular times, or the data may be blighted by missingness due to measurement device failures. In the real-valued case, the common practice of preprocessing the data to mitigate against irregular or missing observations results in inaccuracies in long memory estimation by traditional methods. More specifically, there is now well-documented evidence that preprocessing by imputation or interpolation as well as data aggregation leads to Both components of the two datasets show autocorrelation at large lags, indicating persistent behaviour overestimation of persistence; see, for example, Beran et al. (2013), Zhang et al. (2014) or Knight et al. (2017). In practice, to the authors' best knowledge, the only technique that permits Hurst exponent estimation for complexvalued processes is that of Coeurjolly and Porcu (2017) which tackles the setting of regularly sampled (proper) complexvalued fractional Brownian motion. Motivated by the serious implications of inaccurate estimation in the real-valued setting, in this work we propose the first methodological approach that answers the timely challenge of accurate assessment of long memory persistence for complex-valued processes featuring regular or irregular sampling (including missingness).
At the heart of our methodology is a second generation wavelet-based approach. The reasoning behind this choice is twofold: (1) (classical) wavelets have proved to be very successful in the context of regularly sampled (real-valued) time series with long memory and are considered the 'right domain' of analysis (Flandrin 1998), and (2) for irregularly sampled (real-valued) processes, or those featuring missingness, the wavelet lifting algorithm of Knight et al. (2017) has provided a first long memory estimation solution and was shown to yield competitive results even for regularly sampled data.
The main contributions of the work in this paper are as follows. We propose (1) a novel lifting algorithm designed to work on complex-valued data with a potentially irregular sampling structure and (2) a Hurst parameter estimator for complex-valued processes sampled with a regular or irregular structure. Our method will be shown to improve on real-valued Hurst estimation results, including for regularly spaced data.
The remainder of this article is organized as follows. We begin, in Sect. 2, by reviewing (complex-valued) long memory processes and giving an overview of wavelet lifting transforms. Section 3 introduces our novel complex-valued lifting transform, establishes its iterative bases construction and theoretical results on its decorrelation properties. Section 4 demonstrates how these properties can be exploited to design our proposed lifting-based Hurst exponent estimation procedure for complex-valued data sampled with irregularity/ missingness. Section 5.1 contains a simulation study evaluating the performance of our new method using synthetic data. In Sect. 5.2, we consider the application of our approach to the wind series datasets introduced in Sect. 1.1, discussing the potential consequences of our analysis. Finally, Sect. 6 outlines some avenues of future work and discusses other potential applications.

Complex-valued processes
Let us denote a (complex-valued) second-order stationary time series by {X t } and its autocovariance function as γ X (t i − t j ) = E(X t i X t j ), under the assumption that E(X t ) = 0 and denoting by · p complex conjugation. As the autocovariance function γ X does not completely characterize a complex-valued time series, we also make use of its complementary or pseudo-covariance, r X (t i − t j ) = E(X t i X t j ), again assuming E(X t ) = 0. In general, both autocovariances are complex-valued and have the properties of Hermitian symmetry and symmetry, respectively [see, e.g. Sykulski and Percival (2016)]. In many applications, such as radar and communications, processes are assumed to have the property that r X (· p) = 0 (Neeser and Massey 1993;Picinbono 1994;A d a l ie ta l . 2011); such processes are known as proper or circularly symmetric and are completely determined by their autocovariance γ X . In contrast, applications such as those described in Schreier and Scharf (2010), Adali et al. (2011) and Chandna and Walden (2017) deal with improper processes, whereby there exists a lag τ such that r X (τ ) = 0. Another often encountered property is that of time reversibility; for complex-valued processes, Didier and Pipiras (2011)h a v e shown that time reversibility results in complex-valued processes with real-valued autocovariances, which is precisely the setting under which Sykulski and Percival (2016) develop their exact simulation method for improper stationary Gaussian processes.

Long memory and its estimation
Classical literature for long-range behaviour of real-valued processes shows that persistence is often characterized by a parameter, such as the Hurst exponent, H , introduced to the literature by Hurst (1951) in hydrology and its estimation is treated across a large body of established literature, for example Beran et al. (2013). Mandelbrot and Ness (1968) introduced self-similar and related processes with long memory, along with the associated statistical inference. Extensions of fractional Brownian motion to the complexvalued case, defined as a self-similar Gaussian process with stationary increments, are dealt with in, for example, Coeurjolly and Porcu (2017) and Lilly et al. (2017). Put simply, the property of self-similarity amounts to the preservation of the process' statistical properties in the face of rescaling, thus naturally fostering the definition of the Hurst exponent.
Just as in the real-valued case, a complex-valued self- a H X (t) for a > 0, H ∈ (0, 1) and where d = means equal in distribution (Coeurjolly and Porcu 2017). Note that the self-similarity definition implies that both the real and imaginary strands of the complex-valued process {X t } evolve according to the same exponent H . The property of self-similarity results into the fBM spectrum to behave as f X (ω) = A 2 |ω| −2δ for frequencies ω, a constant A and δ ∈ (1/2, 3/2). The spectral slope parameter δ is linked to the aspect ratio of process rescaling for self-similar behaviour as H = δ − 1/2 ∈ (0, 1) and also determines the degree of persistence in the differenced version of the process, the fractional Gaussian noise ). An example of such a process is the improper fractional Gaussian noise with the pseudo-covariance proportional to the autocovariance (both real-valued), both proportional to τ 2δ−3 (Sykulski and Percival 2016;Lilly et al. 2017).
For real-valued time series, estimation of the Hurst exponent H traditionally takes place in the time domain (Mandelbrot and Taqqu 1979;Bhattacharya et al. 1983;Taqqu et al. 1995;Giraitis et al. 1999;Higuchi 1990;Peng et al. 1994) and/ or in the frequency domain by means of connections to Fourier or wavelet spectrum decay, for example Lobato and Robinson (1996), McCoy and Walden (1996), Whitcher and Jensen (2000) and Abry et al. (2013). Recent works that deal with long memory estimation in various settings are Vidakovic et al. (2000), Shi et al. (2005), Hsu (2006), Jung et al. (2010) and Coeurjolly et al. (2014). Some authors have recently considered Hurst estimation using complexvalued wavelets in the regularly spaced real-valued image context; see Nelson and Kingsbury (2010), Jeon et al. (2014) and Nafornita et al. (2014). Reviews comparing several techniques for Hurst exponent estimation (for real-valued series) can be found in, for example, Taqqu et al. (1995). Even when only considering real-valued data, Knight et al. (2017)show that methods designed for regularly spaced data often fail to deliver a robust estimate if the time series is subject to missing observations or has been sampled irregularly, and in this context they propose a lifting-based approach for Hurst estimation. Whilst this approach serves well when the process is real-valued, it cannot cope with complex-valued processes. Coeurjolly and Porcu (2017) propose a method of estimation in the setting of (circular) complex-valued fractional Brownian motion assuming a regular sampling structure, but cannot readily cope with sampling irregularity or measurement dropout/ missingness.

Wavelet lifting paradigm for irregularly sampled real-valued data
The lifting algorithm, first introduced by Sweldens (1995), constructs 'second-generation' wavelets adapted for nonstandard data settings, such as intervals, surfaces, as well as irregularly spaced data. Lifting has since been used successfully for a variety of statistical problems dealing with real-valued signals, including nonparametric regression, spectral estimation and long memory estimation; see, for example, Trappe and Liu (2000), Nunes et al. (2006), Knight et al. (2012), Knight et al. (2017) and Hamilton et al. (2017). For a recent review of lifting, the reader is directed to Jansen and Oonincx (2005). As our proposed lifting transform and subsequent long memory estimation method both make use of a recently developed lifting transform, the lifting one coefficient at a time (LOCAAT) transform of Jansen et al. (2001), Jansen et al. (2009), we shall briefly introduce it next.
Suppose a real-valued function f (· p) is observed at a set of n, possibly irregular, locations or time points, x = (x 1 ,...,x n ) and is represented by . The lifting algorithm of Jansen et al. (2001) begins with the f = ( f 1 , ..., f n ) values, known as scaling function values, together with an interval associated with each location, x i , which represents the 'span' of that point. By performing LOCAAT, we aim to transform the initial f into a set of, say, L coarser scaling coefficients and (n − L) wavelet or detail coefficients, where L is a desired 'primary resolution' scale. This is achieved by repeating three steps: split, predict and update. In the algorithm of Jansen et al. (2001), the split step is performed by choosing a point to be removed ('lifted'), j n , say. We denote this point by (x j n , f j n ) and identify its set of neighbouring observations, I n .T h epredict step estimates f j n by using regression over the neighbouring locations I n . The prediction error (the difference between the true and predicted function values), d j n or detail coefficient, is then computed by where (a n i ) i∈I n are the weights resulting from the regression procedure. For points with only one neighbour, the prediction is simply d j n = f j n − f i . This prediction via regression can of course be carried out using a variety of weights. Notably, Hamilton et al. (2017) proposed to use two (rather than just one) prediction filters and encompassed the detail information into complex-valued wavelet coefficients. As more information was extracted from the signal, this approach was shown to improve results for nonparametric regression and spectral/ coherence estimation settings, but nevertheless is limited to real-valued signals. The update step consists of updating the f -values of the neighbours of j n used in the predict step using a weighted proportion of the detail coefficient: where the weights (b n i ) i∈I n are subject to the constraint that the algorithm preserves the signal mean value (Jansen et al. 2001(Jansen et al. , 2009). The interval lengths associated with the neighbouring points are also updated to account for the effect of the removal of j n . In effect, this attributes a portion of the interval associated with the removed point to each neighbour.
These split, predict and update steps are then repeated on the updated signal, and after each iteration a new wavelet coefficient is produced. Hence, after say (n − L) removals, the original data are transformed into L scaling and (n − L) wavelet coefficients. This is similar in spirit to the classical discrete wavelet transform (DWT) step which takes a signal vector of length 2 ℓ and through filtering operations produces 2 ℓ−1 scaling and 2 ℓ−1 wavelet coefficients.
An attractive feature of lifting schemes, including the LOCAAT algorithm, is that the transform can be inverted easily by reversing the split, predict and update steps.
The current scarcity of Hurst estimation techniques for complex-valued processes, in a uniform, but even more so in a non-uniform sampling setting, and the effectiveness of the lifting transform in representing irregularly sampled information, jointly motivate our proposed approach to tackle this analysis problem: firstly we propose a novel lifting transform able to cope with irregularly sampled complexvalued processes, and secondly we construct a long memory estimator using the corresponding complex-valued lifting coefficients. Notably, the proposed method is suitable for regularly or irregularly sampled processes, both real-and complex-valued; in particular, Hurst estimation is addressed for improper complex-valued processes that have real-valued covariances, as introduced in Sykulski and Percival (2016), as well as for proper complex-valued series, as described in Coeurjolly and Porcu (2017).

A new lifting algorithm for complex-valued signals and its properties
In this section, we introduce our proposed lifting algorithm for a complex-valued function and establish its decorrelation properties.

Proposed C 2 -LOCAAT algorithm for complex-valued signals
Suppose now a complex-valued function f (· p) is observed at a set of n, possibly irregular, locations or time points, x = (x 1 , ..., x n ) and is represented by Our proposed algorithm builds a redundant transform that starts with the complex-valued signal f = ( f 1 , ..., f n ) ∈ C n and transforms it into a set of, say, R coarse (complexvalued) scaling coefficients and 2×(n− R) (complex-valued) detail coefficients, where R is the desired primary resolution scale. As is usual in lifting, our algorithm reiterates the three steps-split, predict and update-in a modified version, as described below.
At the first stage (n) of the algorithm, denote the smooth coefficients as c n,k = f k , the set of indices of smooth coefficients by S n ={ 1,...,n} and the set of indices of detail coefficients by D n =∅. The sampling structure is accounted for using the distance between neighbouring observations, and at stage n we define the span of x k as s n,k = x k+1 −x k−1 2 . At the next stage (n −1), the proposed algorithm proceeds as follows: Split Choose a point to be removed and denote its index by j n . Typically, points from the densest sampled regions are removed first, but other predefined removal choices are also possible, as we shall discuss below. We shall often refer to the removal order as a trajectory, following Knight and Nason (2009).
Predict The set of neighbours (J n ) of the point j n is identified. Note that the set of neighbours is indexed by n as the choice will depend on the removal stage (via the points remaining at that stage). The predict step estimates c n, j n = f j n by using regression over the neighbouring locations J n and two prediction schemes, a strategy first suggested by Hamilton et al. (2017) for real-valued signals. Each prediction scheme is defined by its respective filter, L and M, orthogonal on each other. The filter L corresponds to the (possibly) linear regression choice as is usual in LOCAAT. The filter M is linked to L through a specific set of properties, discussed in detail in Hamilton et al. (2017) and described in step 2 of Algorithm 1. Both filters are constructed such that the corresponding wavelet coefficients of any constant polynomial are 0 (known in the wavelet literature, as possessing (at least) one vanishing moment).
The prediction residuals following the use of each filter are given by where {l n i } i∈J n ∪{ j n } and {m n i } i∈J n ∪{ j n } are the prediction weights associated with filters L and M; as is typical in LOCAAT, we take l n j n = 1. Our proposal is to obtain two complex-valued detail (wavelet) coefficients by combining the two prediction residualsasfollows: Note that if the original signal is real-valued, then d (2) = d (1) and all we need is d (1) . However, when the process is complex-valued as is the case here, d (2) = d (1) and we need both d (1) and d (2) . This is in contrast to Hamilton et al. (2017), where the information from the two prediction schemes is corroborated into just one complex-valued wavelet coefficient, and although its naive implementation on the real and imaginary process strands would yield two sets of complexvalued wavelet coefficients, it would not be obvious how to best combine their information.
Update In the update step, both the (complex-valued) smooth coefficients {c n,i } and (real-valued) spans of the neighbours {s n,i } are updated according to filter L: where b n i = (s n, j n s n−1,i )/( i∈J n s 2 n−1,i ) are the update weights, again computed so that the mean of the signal is preserved (Jansen et al. 2009). Updating the neighbours' spans accounts for the modification to the sampling grid induced by removing one of the observations, and using just one filter for update [akin to the approach of Hamilton et al. (2017)] ensures the use of a common scale across both d (1) and d (2) .
The observation j n is then removed from the set of smooth coefficients; hence, after the first algorithm iteration, the index set of smooth coefficients is S n−1 ={ 1, ..., n}\{ j n } and the index set of detail coefficients is D n−1 ={j n }.The algorithm is then reiterated until the desired primary resolution level R has been achieved. In practice, the choice of the primary level R in LOCAAT lifting schemes is not crucial provided it is sufficiently low (Jansen et al. 2009), with R = 2 recommended by Nunes et al. (2006).
The three steps are then repeated on the updated signal, and each repetition yields two new wavelet coefficients. After points j n , j n−1 ,..., j R+1 have been removed, the function can be represented as a set of 2 × (n − R) detail coefficients, {d j k } k∈D n−R , and R smooth coefficients, {c r −1,i } i∈S n−R , thus resulting in a redundant transform. An algorithmic description of C 2 -LOCAAT appears in Algorithm 1.
The proposed algorithm can then be easily inverted by recursively 'undoing' the update, predict and split steps described above for the first filter (L). More specifically, the inverse transform can be performed by the steps Undoing either predict (8)or(9) step is sufficient for inversion.
A few remarks on our proposed C 2 -LOCAAT lifting algorithm are now in order.
Transform matrix representation As with any linear transform, the algorithm that determines one set of detail coefficients, say d (1) , can also be represented using a matrix transform, i.e. d (1) = W (c) f , where W (c) is a n × n Proposed C 2 -LOCAAT using two symmetrical neighbours: Choose a removal order (trajectory), either dictated by the sampling sequence or following a random permutation.
1. Split: Choose the first/next point to be removed from the set of smooth coefficients S n ={1, ..., n} and denote its index by j n . 2. Predict: (a) Determine the set of neighbours J n (one each side of j n )and use linear regression over the neighbourhood in order obtain a prediction at j n . Calculate the prediction residual, λ jn , as the difference between the observed and predicted values at j n (see Eq. (3)). This coupled with the requirement of achieving at least one vanishing moment amounts to obtaining a filter L = (l 1 , 1, l 3 ) with l 1 + l 3 = 1. (2) jn = λ jn − i μ jn . 3. Update: the smooth coefficients and their associated scales using the filter L (see Eq. (7)).
Update the index sets of smooth and detail coefficients as S n−1 = S n \{ j n } and D n−1 ={j n }, respectively. 4. Iterate steps 1-3 for j n−1 ,..., j R+1 with a typical primary resolution level R = 2, hence obtain a set of complex-valued wavelet coefficients indexed by D R ={j n , ..., j R+1 }.

Alg. 1
The complex-valued lifting scheme (C 2 -LOCAAT) on a complex-valued signal matrix with complex-valued entries. When expressed as a matrix transform, our proposed C 2 -LOCAAT algorithm for a complex-valued process ( f ) can be expressed as with

Wavelet lifting scales and artificial levels
The (log 2 ) span associated with an observation at the last stage before its removal, say log 2 (s k, j k ) for the detail coefficient d j k obtained at stage k, is used as a (continuous) measure of scale-this indirectly stems from the fact the wavelets are not dyadically scaled versions of a single mother wavelet. As the notion of scale of lifting wavelets is continuous, Jansen et al. (2009)  . We also adopt this strategy to group the complex-valued wavelet coefficients produced using our C 2 -LOCAAT algorithm. An alternative is to group the coefficients via their interval lengths into ranges (2 j−1 α 0 , 2 j α 0 ], where j ≥ 1 and α 0 is the minimum scale. This construction more closely resembles classical wavelet dyadic scales, but both produce similar results. Note that by construction, the C 2 -LOCAAT transform crucially uses a common scale for both real and imaginary parts, and it is this feature that ensures that information is obtained on the same scale at every step.  (2009) introduced the nondecimated lifting transform, which proposes examining data using P bootstrapped paths from the space of n! possible trajectories. Aggregating the information obtained via this approach typically improves estimator variance and accuracy, not only in the long memory estimation context ), but also for, for example nonparametric regression (Knight and Nason 2009). This strategy will be embedded in our proposed methodology in Sect. 4.

Refinement equations for the scaling and wavelet functions under C 2 -LOCAAT
Although not explicitly apparent, the wavelet lifting construction induces a biorthogonal (second generation) wavelet basis construction; see, for example Sweldens (1995). In the real-valued lifting one coefficient at a time paradigm, as the algorithm progresses, scaling and wavelet functions decomposing the frequency content of the signal are built recursively according to the predict and update Eqs. (1) and (2) (Jansen et al. 2009). Also, the (dual) scaling functions are defined recursively as linear combinations of (dual) scaling functions at the previous stage.
Let us now investigate the basis decomposition afforded by our proposed C 2 -LOCAAT transform, as a result of performing the split, predict and update steps. As our construction involves two prediction filters, we decompose f on two biorthogonal bases. Our construction is reminiscent of the dual-tree complex wavelet transform (CWT) (Kingsbury 2001;Selesnick et al. 2005) which employs two separate classical wavelet transforms, but fundamentally differs through the construction of linked orthogonal filters.
In our proposed construction, let us denote the two scaling function and wavelet biorthogonal bases by ϕ (1) ,φ (1) ,ψ (1) ,ψ (1) and ϕ (2) ,φ (2) ,ψ (2) ,ψ (2) , respectively. We now explore their relationships and recursive construction. At stage r , the complex-valued signal f can be decomposed on each basis as r ,k > for both bases i = 1, 2, where the inner product is as usual defined on L 2 (C). As the update step is the same for both bases, it follows that c (1) r ,k >, for all r , k and thus the dual scaling functions coincide under both bases. In what follows, we shall denote these byφ r ,k .

Proposition 1 Suppose we are at stage r − 1 of the C 2 -LOCAAT algorithm. The recursive construction of the primal scaling and wavelet functions corresponding to the coefficients d (1) , in terms of the functions at the previous stage r , is given by
where a r j = ℓ r j + i m r j andã r j = a r jr a r j |a r jr | 2 . Similarly, the recursive construction for the primal scaling and wavelet functions corresponding to the coefficients d (2) , in terms of the functions at the previous stage r , is given by For the corresponding dual bases, the recursive constructions are given bỹ whereψ L denotes the dual wavelet function corresponding to the L-filter only.

Decorrelation properties of the C 2 -LOCAAT algorithm
Wavelet transforms are known to possess good decorrelation properties; see in the context of long memory processes, for example, Abry et al. (2000), Jensen (1999), Craigmile et al. (2001) for classical wavelets, and Knight et al. (2017) for lifting wavelets constructed by means of LOCAAT. The decorrelation property amounts to the consequent removal of the long memory in the wavelet domain, and thus estimation of the Hurst exponent can be carried out in this simplified context. Therefore, we next provide mathematical evidence for the decorrelation properties of the C 2 -LOCAAT algorithm and these will subsequently benefit our proposed long memory estimation procedure (see Sect. 4). The statement of Proposition 2 (next) aims to establish decorrelation results similar to earlier ones concerning regular wavelets (see, e.g. Abry et al (2000, p. 51) for fractional Gaussian noise, Jensen (1999, Theorem 2) for fractionally integrated processes or Theorem 5.1 of Craigmile and Percival (2005) for fractionally differenced processes) and lifting wavelets [see Proposition 1 in Knight et al. (2017)]. In what follows, we establish the decorrelation properties for the proposed complex-valued lifting transform C 2 -LOCAAT in a more general data setting than previously considered for lifting wavelets, involving complex-valued stationary processes with real-valued autocovariances that may be proper or improper in nature.  (2) j r } r have autocorrelation and pseudo-autocorrelation whose magnitudes decay at a faster rate than for the original process.

Proposition 2 Let X ={ X t i } N −1 i=0 denote a (zero-mean) stationary long memory complex-valued time series with Lipschitz continuous spectral density f X . Assume the process is observed at irregularly spaced times {t
The proof can be found in 'Appendix A, Section A.2' and uses similar arguments to the proof of Proposition 1 in Knight et al. (2017), adapted for the C 2 -LOCAAT algorithm and complex-valued setting we address here. Just as for LOCAAT ), Proposition 2 above assumes no specific lifting wavelet and we conjecture that if smoother lifting wavelets were employed, it might be possible to obtain even better rates of decay.

Long memory parameter estimation using complex wavelet lifting (CLoMPE)
As the newly constructed wavelet domain through C 2 -LOCAAT displays small magnitude autocorrelations, we now focus on the wavelet coefficient variance and show that the log 2 -variance of each of the complex-valued lifting coefficients d (1) and d (2) is linearly related to their corresponding artificial scale level, a result paralleling classical and realvalued lifting wavelet results. This result suggests a Hurst parameter estimation method for potentially irregularly sampled long memory processes that take values in the complex (C) domain. Proposition 3 next establishes a result similar to that in Proposition 2 of Knight et al. (2017) by taking into account the specific C 2 -LOCAAT construction and thus extends the scope of Hurst estimation methodology to irregularly sampled complex-valued processes.
i=0 denote a (zero-mean) complex-valued long memory stationary time series with finite variance and spectral density f X (ω) ∼ c f |ω| −α as ω → 0,forsomeα ∈ (0, 1). Assume the series is observed at irregularly spaced times {t i } N −1 i=0 , and transform the observed data X into a collection of lifting coefficients, {d (1) j r } r and {d (2) j r } r , via application of C 2 -LOCAAT from Sect. 3.1. Let r denote the stage of C 2 -LOCAAT at which we obtain the wavelet coefficients d (ℓ) j r (with ℓ = 1, 2), and let its corresponding artificial level be j ⋆ . Then, denoting by |· p| the C-modulus, we have for some constant K The proof can be found in 'Appendix A, Section A.3'. This result suggests a long memory parameter estimation method for an irregularly sampled, complex-valued time series, described in Algorithm 2, which we shall refer to as CLoMPE (Complex-valued Long Memory Parameter Estimation Algorithm). Section 5.1, next, will show that our proposed CLoMPE methodology below not only adds a new much needed tool in the estimation of long memory for complex-valued processes, but also improves Hurst exponent estimation for real-valued processes, sampled both regularly and irregularly.
where n j ⋆ is the number of observations in artificial level j ⋆ .Note that the C 2 -LOCAAT construction, by its use of an unique update step, ensures that the number of observations in each j ⋆ artificial level coincides for both ℓ = 1andℓ = 2. 4. Fit a weighted linear regression to all points log 2 σ (ℓ) j ⋆ 2 with ℓ = 1, 2 versus j ⋆ ; use its slope to estimate α as suggested by the results in Proposition 3. Note that Eq. (23) allows us to pull the information across both d (1) and d (2) . 5. Iterate steps A-1 to A-4 for P bootstrapped trajectories, obtaining an estimateα p for each trajectory p ∈ 1, P. The final estimator iŝ α = P −1 P p=1α p , from which an appropriate estimate for H can be obtained.

Alg. 2
The long memory parameter estimation procedure (CLoMPE) for a complex-valued process {X t i } N −1 i=0 , sampled at potentially irregularly spaced times 5 Simulated performance of CLoMPE and real data analysis

Simulated performance of CLoMPE
In what follows, we investigate the performance of our Hurst parameter estimation technique for complex-valued series. We simulated realizations of two types of long memory processes, namely circularly symmetric complex fractional Brownian motion, as introduced in Coeurjolly and Porcu (2018), and improper complex fractional Gaussian noise (with real-valued covariances) as described in Sykulski and Percival (2016), 1 investigating series of lengths of 256, 512 and 1024. These lengths were chosen to reflect realistic data collection scenarios-long enough for the Hurst parameter (a low-frequency asymptotic quantity) to be reasonably esti-mated, whilst reflecting lengths of datasets encountered in practice.
To investigate the effect of sampling irregularity on the performance of our method, we simulated datasets with different levels of random missingness (5-20%), which are representative of degrees of missingness reported in many application areas, for example in paleoclimatology and environmental series (Broersen 2007;Junger and Ponce de Leon 2015).
We compared results across the range of Hurst parameters H = 0.6,...,0.9. Each set of results is taken over K = 100 realizations and P = 50 lifting trajectories. Our CLoMPE technique was implemented using modifications to the code from the liftLRD package (Knight and Nunes 2016) and CNL-Treg package fortheR statistical programming language (R Core Team 2013), both available on CRAN. The measure we use to assess the performance of the methods is the mean squared error (MSE) defined by In the case of regularly spaced circularly symmetric fractional Brownian motion (i.e. 0% missingness), we compare our CLoMPE estimation technique with the recent estimation method in Coeurjolly and Porcu (2017) (denoted 'CP'). 2 Table 1 reports the mean squared error for our CLoMPE estimator on the complex-valued fractional Brownian motion series for different degrees of missingness (0% up to 20%). In the case of regularly spaced series, our estimation method works well when compared to the 'CP' method. This is pleasing since the "CP" method is designed for regularly spaced series, whereas CLoMPE is specifically designed for irregularly spaced series. The tables also show that the CLoMPE technique is robust to the presence of missingness, attaining good performance even for high degrees of missingness (20%).
For the complex-valued fractional Gaussian noise, Table 2 demonstrates that our CLoMPE estimation technique performs well for regular and irregular settings, with only a slight degradation in performance for increasing missingness.
We also studied the empirical bias of our estimator for both types of long memory process. For reasons of brevity, we do not report these results here, but these can be found in Appendix B in the supplementary material. As for the mean squared error results above, there is a small drop in performance with increasing missingness, and our estimator performs only slightly worse in terms of bias when compared to the 'CP' method.

Real-valued processes
To assess whether our complexvalued approach achieves performance gains for real-valued processes, we repeated the simulation study from Knight et al. (2017) for a number of long memory processes. In particular, we studied the performance of our estimator for real-valued fractional Brownian motion, fractional Gaussian noise and fractionally integrated series, for a range of Hurst parameters and levels of missingness. The processes were simulated via the fArma add-on package (Wuertz et al. 2013). We compare our method with the real-valued lifting technique of Knight et al. (2017), shown to perform well in a number of settings. Again, for brevity, we do not report these bias results here, but they can be found in Appendix B in the supplementary material. The results show that our method is competitive with the real-valued estimation method in Knight et al. (2017), achieving better results (in terms of MSE and bias) in the majority of cases for fractional Gaussian noise and fractionally integrated series. For fractional Brownian motion, we observe that our method achieves gains in mean square error, albeit at a cost of a decrease in bias performance. These results agree with other studies using complex-valued wavelet methodology, which is shown to outperform its real-valued counterpart in a variety of applications, from denoising (Barber and Nason 2004 to Hurst estimation in the (real-valued) image context (Nelson and Kingsbury 2010;Jeon et al. 2014;Nafornita et al. 2014). This is due to the use of two rather than just one filter, thus eliciting more information from the signal under analysis.

Analysis of complex-valued wind series with CLoMPE
In this section, we provide a more detailed long memory analysis of the complex-valued wind series described in Sect. 1.1. More specifically, we applied our CLoMPE Hurst estimation method to the (detrended) irregularly sampled wind series to assess its persistence properties. The estimated Hurst parameter wasĤ C = 0.86 for the Wind A series andĤ C = 0.8 for the Wind B series, based on P = 50 lifting trajectories. Both of these estimates indicate moderate long memory.

3 (4) 3 (3) 3 (3) 3 (5) 2 (2) 2 (3) 3 (3) 3 (3) 2 (2) 3 (2) 3 (2) 4 (3)
It could also be argued that these differences in estimates are unsurprising, since the dependence structure for the magnitude series, shown in Fig. 4, is visibly different to that of the real and imaginary component series shown in Fig. 2. We argue that our estimation of the long mem-ory parameter for this series is more reliable than that in the currently existing literature, as our proposed algorithm naturally encompasses both the complex-valued and improper features of wind series. A complex-valued analysis using our approach could hence provide more accurate long memory information, reducing miscalibration of predictive climate models. We further suggest that this precision would provide more certainty when assessing renewable energy resource potential, as discussed in, for example, Bakker and van den Hurk (2012).

Discussion
Hurst exponent estimation is a recurrent topic in many scientific applications, with significant implications for modelling and data analysis. One important aspect of real-world datasets is that their collection and monitoring are often not straightforward, leading to missingness, or to the use of proxies with naturally irregular sampling structures. In parallel, in many applications of interest there is a natural complexvalued representation of data. To this end, this article has proposed the first Hurst estimation technique for complexvalued processes with sampling missingness or irregularity, and in doing so it has also constructed a novel lifting algorithm able to work on complex-valued data sampled with irregularity. Until the work in this article, Hurst estimation The dependence structure is markedly different to that shown for the real and imaginary series components shown in Fig. 2 methods have not been able to exploit the wealth of signal information in such data, whilst also coping with irregular sampling regimes. Our CLoMPE wavelet lifting methodology was shown to give accurate Hurst estimation for a variety of complex-valued fractional processes and is suitable for both proper and improper complex-valued processes. Simulations demonstrate that the technique is robust to estimation with significant degrees of missingness, as well as in the non-missing (regular) setting. We have demonstrated the use of our CLoMPE technique in an application arising in environmental science. Through our analysis of wind speed data, we have shown that embedding directional wind information in the analysis can lead to significantly different Hurst exponent estimates when compared to only considering real-valued information, such as magnitude series. This highlights that not exploiting a complex-valued data representation in this setting can potentially result in misleading conclusions being drawn about wind persistence. This in turn has a subsequent impact on parameters in climate models and inefficiencies in resource management decisions.
Whilst the development of our proposed complex-valued Hurst estimator was motivated by an application in climatology, we believe that the work in this article has sufficient generality to have appeal in other settings. We thus conclude this article with outlining some example applications in which our methodology is potentially beneficial. Data from neuroimaging studies Functional magnetic resonance imaging (fMRI) data continue to enjoy popularity in the neuroscience community due to their non-invasive acqui-sition and data richness; see, for example, Aston and Kirch (2012) for an accessible introduction to the area from the statistical perspective. In particular, fMRI studies often measure information on blood flow in the brain; these voxel-level data are used to investigate neuronal activity of participants during task-based experiments, and many authors have asserted that such time courses possess fractional noise structure, see, for example, Bullmore et al. (2003). Evaluation of the Hurst exponent in this context has been shown to be important in characterizing brain activity under a range of conditions, indicating different levels of cognitive effort Ciuciu et al. 2012;Churchill et al. 2016). Despite data collection being performed in a controlled set-up, recent work has highlighted the need for tailored statistical methodology to cope with both unbalanced designs, as well as missingness, which can feature in fMRI data for a number of reasons (Lindquist 2008;Ferdowsi and Abolghasemi 2018). In actuality, fMRI scanners record both phase and magnitude information, though most studies only use the magnitude image for analysis. As a result, there has been a recent body of work dedicated to complex-valued analysis of fMRI data, most notably by Rowe and collaborators [see, e.g. Rowe (2005) and Rowe (2009) and Adrian et al. (2018)]. Such an approach has shown improvements over real-valued methods for a range of analysis tasks; see also the work by Adali and collaborators (Calhoun et al. 2002;Lietal.2011;Rodriguez et al. 2012). Thus, our methodology has the potential of taking advantage of the full complex-valued image information whilst also coping with the inherent non-uniform sampling.
Ocean surface measurement devices There is a long-standing history of studying ocean circulation using GPS-tracked ocean buoy drifters, see e.g. Osborne et al. (1989). Since these trajectories are measured in the longitude-latitude plane, they are often converted to complex-valued vector series; see, for example, Sykulski et al. (2017). It has long been observed that due to the buffeting motion of ocean currents, positional drifter trajectories often exhibit fBM-like behaviour, whilst their velocity over time resembles fGn characteristics (Sanderson and Booth 1991;Summers 2002;Qu and Addison 2010;Lilly et al. 2017). In this context, accurate Hurst exponent estimation is useful in indicating the intensity of ocean turbulence, giving evidence towards particular theorized dynamical regimes (Osborne et al. 1989). These in turn can provide insight into initial conditions and origin of ocean circulation. Moreover, the trajectories often display rotary characteristics (Elipot and Lumpkin 2008;Elipot et al. 2016). Due to the interrupted nature of satellite coverage and the possibility of measurements from multiple satellite orbits, the temporal sampling of the trajectories are typically highly non-uniform. In addition, due to the irregular sampling structure, the data are often interpolated prior to analysis (Elipot et al. 2016). One aspect of exploration in this setting could be to contrast Hurst estimation using our proposed methodology with/without data interpolation to investigate its effect, since previous work substantiates that such processing can produce bias (in the context of Hurst exponent estimation) for real-valued series . It would also be interesting to investigate modifications to our technique to parameter estimation for Matérn processes discussed in Lilly et al. (2017). a n jn |a n jn | 2 1 − j∈J n a n j b n j . Since f (x) := ψ (1) j n (x), we then have Using the primal scaling function construction in Eq. (27), we obtain an expression for the primal wavelet function which demonstrates the recursive construction from stage n to n − 1 and concludes the proof for the primal wavelet and scaling function construction.
For the dual scaling functions, we use the update equations and the fact that c r , j =< f ,φ r , j > for any r , hence we have, at stage n, whereψ L denotes the dual wavelet function corresponding to the L-filter only. Thus, the recursive relations for the dual scaling functions arẽ Similarly, since d (1) j n = c n, j n a n j n − j∈J n a n j c n, j ,wehave < f ,ψ (1) j n >=< f , a n j nφ n, j n − j∈J n a n jφ n, j > and we obtain the dual wavelet constructioñ ψ (1) j n = a n j nφ n, j n (x) − j∈J n a n jφ n, j (x).
These steps are subsequently reiterated, and hence the same also holds for stage r . In order to obtain the primal scaling function recursive construction corresponding to the second basis, we proceed in the same way as for the first basis and similarly obtain We obtain the primal wavelet equations in a similar manner to the previous development ψ (2) j n (x) = a n j n |a n j n | 2 ϕ (2) n, j n (x) − j∈J n b n j ϕ (2) n−1, j (x).
The above equations show that the primal scaling and wavelet functions corresponding to the second basis are the conjugates of the corresponding primal and wavelet functions under the first basis, respectively. As already explained, the update step is the same for both bases and c r , r ,k >, for all r , k thus the dual scaling functions coincide under both bases (φ (1) r ,k =φ (2) r ,k ). For the dual wavelet function, following the same approach as above, we obtaiñ ψ (2) j n (x) = a n j nφ n, j n (x) − j∈J n a n jφ n, j (x).
This concludes the proof for the second basis. ⊓ ⊔

A.2 Proof of Proposition 2
Let {X t } be a zero-mean complex-valued stationary long memory series with autocovariance γ X (τ ) ∼ c γ |τ | −β .W e note here that for improper processes of the type considered in Sykulski and Percival (2016), the pseudo-autocovariance has the same decay rate as the autocovariance (r X (τ ) ∼ c r |τ | −β ) whilst for proper processes, r X (τ ) = 0, ∀τ , hence we shall concentrate on the lifting decorrelation properties for improper processes. The autocovariance of {X t } can be written as γ X (t i −t j ) = E(X t i X t j ) and r X (t i − t j ) = E(X t i X t j ), assuming E(X t ) = 0, where 0 is to be understood as the complex number 0 = 0 In what follows, we drop the superscript (ℓ) in order to avoid notational clutter.
Using the assumption that E(d j ) = 0, it follows that (29) w h e r ew eh a v eu s e dd j r =< X ,ψ j r >, and the timepoints j r and j k are distinct. In what follows, denote the interval length (i.e. continuous scale) of detail d j r by I r , j r . Since from (15) and (22), regardless of whether we work with the basis indexed by ℓ = 1orℓ = 2, the (dual) wavelet functions are linear combinations of the (same) dual scaling functions, hence Eq. (29) can be rewritten as where A generically denotes the appropriate coefficient that corresponds to basis ℓ = 1orℓ = 2, butφ is the same for both bases.
As C 2 -LOCAAT progresses, the (dual) scaling functions are defined recursively as linear combinations of (dual) scaling functions at the previous stage, see, for example, Eq. (19). Hence, the scaling functions in the above equation can be written as linear combinations of scaling functions at the first stage (i.e. r = n). Due to the linearity of the integral operator, (30) can be written as a linear combination with complex-valued coefficients of terms like B n,i, j := Rφ n,i (t) Rφ n, j (s)γ X (t − s) ds dt = Rφ n,i (t) φ n, j ⋆γ X (t) dt, where ⋆ is the convolution operator, and i and j refer to time locations that were involved in obtaining d j r and d j k .N o t e that at this stage we do not use complex conjugation as the (dual) scaling functions are initially defined (at stage r = n) as scaled characteristic functions of the intervals associated with the observed times, i.e.φ n,i (t) = I −1 n,i χ I n,i (t) (thus realvalued).
Using Parseval's theorem in Eq. (31)gives B n,i, j = (2π) −1 Rφ n,i (ω) φ n, j ⋆γ X (ω) dω where in generalĝ denotes the Fourier transform of g.Asthe Fourier transform of an initial (dual) scaling function (scaled characteristic function on an interval, where sinc(x) = x −1 sin(x) for x = 0 and sinc(0) = 1i s the (unnormalized) sinc function, we can write (32)as R sinc ωI n,i /2 sinc ωI n, j /2 exp −i ωδ(I n,i , I n, j ) f X (ω)dω, where δ(I n,i , I n, j ) is the distance between the midpoints of intervals I n,i and I n, j at the initial stage n. Equation (33) can be interpreted as the Fourier transform of u(x) = f X (x) sinc xI n,i /2 sinc xI n, j /2 evaluated at δ(I n,i , I n, j ).
Since the sinc function is infinitely differentiable and the spectrum is Lipschitz continuous, results on the decay properties of Fourier transforms (Shibata and Shimizu 2001, Theorem 2.2) imply that, for i = j, terms of the form B n,i, j decay as O δ(I n,i , I n, j ) −1 . Hence, as in Knight et al. (2017), the further away the time points are, the less autocorrelation is present in the detail coefficients and as the rate of autocorrelation decay is of reciprocal order, it is faster than that of the original process assumed to have long memory (hence O(|τ | −β ) with β ∈ (0, 1)).
A similar argument as above applies for the pseudocovariance r X (t i − t j ) = E(X t i X t j ),as E(d j r d j k ) = Rψ j r (t) Rψ j k (s)r X (t − s) ds dt, (34) and concludes the proof. ⊓ ⊔

A.3 Proof of Proposition 3
As Cov(X t i , X t j ) = γ X (t i − t j ) and d j r =< X ,ψ j r >,i t follows that d j r has mean zero (as the original process is zero-mean) and in a similar manner to (29)wehave E(|d j r | 2 ) = Rψ j r (t) Rψ j r (s)γ X (t − s) ds dt, (35) where again we have dropped the basis index ℓ = 1, 2f o r notational brevity and we remind the reader that |· p| denotes the C-modulus. As before, we denote the associated interval length of the detail d j r by I r , j r . Using the recursiveness in the dual wavelet construction (Eqs. (15) and (22) This can be expanded as E(|d (1) jr | 2 ) = R R a r jr a r jrφr , jr (t)φ r , jr (s)γ X (t − s) ds dt − j∈Jr R R a r j a r jrφr , j (t)φ r , jr (s)γ X (t − s) ds dt − j ′ ∈Jr R R a r jr a r j ′φ r , jr (t)φ r , j ′ (s)γ X (t − s) ds dt + j∈Jr j ′ ∈Jr R R a r j a r j ′φ r , j (t)φ r , j ′ (s)γ X (t − s) ds dt.
As in Proposition 1, using Parseval's theorem we obtain that the above is a linear combination of terms of the form Rχ r , j (s)γ X (t − s)ds dt = R sinc(ωI r ,i /2) sinc(ωI r , j /2)e −i ωδ(I r ,i ,I r , j ) f X (ω)dω, (38) where recall that the hat notation denotes the Fourier transform of a function and δ(I r ,i , I r , j ) denotes the distance between the midpoints of intervals I r ,i and I r , j . Due to the artificial-level construction, the sequence of lifting integrals is approximately log-linear in the artificial level [see Knight et al. (2017) for details], i.e. for those points j r in the j ⋆ th artificial level, we have log 2 I r , j r = j ⋆ + Δ where Δ ∈{ − 1 + log 2 (α 0 ), log 2 (α 0 )} for some α 0 , thus I r , j r = R2 j ⋆ for some constant R > 0. Now suppose i = j and both points belong to the j ⋆ th artificial level. In Eq. (38), we make a change of variable η = ω R2 j ⋆ to obtain where α ∈ (0, 1), Γ is the Gamma function and M = 4c f Γ(−1 − α) sin(πα/2). If i = j are points from the same neighbourhood J r and both belong to the same artificial level j ⋆ , then their artificial scale measure will be the same. Performing the same change of variable as above, we obtain (as ( j ⋆ →∞) = 2 j ⋆ (α−1) c f R α−1 R sinc 2 (η/2)e −i η |η| −α dη = 2 j ⋆ (α−1) R α−1 4c f (2 α − 1) sin(π α/2)Γ (1 − α) = 2 j ⋆ (α−1) R α−1 N , where N = 4c f (2 α − 1) sin(πα/2)Γ (1 − α). All terms in (37) involve points from the same neighbourhood J r , and thus using (40), (41) together with the linearity of the integral operator, we have that where C is a constant depending on c f , R and α. A similar argument applies to the second basis and completes the proof.