Bayesian joint quantile autoregression

Quantile regression continues to increase in usage, providing a useful alternative to customary mean regression. Primary implementation takes the form of so-called multiple quantile regression, creating a separate regression for each quantile of interest. However, recently, advances have been made in joint quantile regression, supplying a quantile function which avoids crossing of the regression across quantiles. Here, we turn to quantile autoregression (QAR), offering a fully Bayesian version. We extend the initial quantile regression work of Koenker and Xiao (2006) in the spirit of Tokdar and Kadane (2012). We offer a directly interpretable parametric model specification for QAR. Further, we offer a p-th order QAR(p) version, a multivariate QAR(1) version, and a spatial QAR(1) version. We illustrate with simulation as well as a temperature dataset collected in Arag\'on, Spain.


Introduction
For time series data, autoregressive (AR) modeling is perhaps the most common approach.A lag one, AR(1), takes the form Y t = µ + ρ(Y t−1 − µ) + ϵ t , with ϵ t following a suitable zero-mean error distribution; a conditional mean is provided.By analogy, quantile autoregression (QAR) considers conditional quantiles.
An issue with quantile regression (QR) is the so-called quantile crossing problem.Modeling quantiles individually enables rich modeling for a given quantile but allows for crossing of quantiles across quantile level τ .For arbitrary values of the regressors, X, we can not ensure that the resulting modeled quantiles will increase in τ .Such modeling is referred to as multiple QR.Inference typically proceeds by minimizing a check loss function or, more formally, assuming an asymmetric Laplace (AL) error term.Examples of multiple QR with AL errors appear in Yu and Moyeed (2001), and Kozumi and Kobayashi (2011) present a Gibbs sampler model fitting implementation.Lum and Gelfand (2012) work in the context of spatially referenced data and extend the AL model to a spatial process.Castillo-Mateo et al. (2023) propose a very flexible spatial AL mixed effects QAR model.
Recent effort has focused on a joint QR modeling to avoid quantile crossing.Adopting restricted support for the regressors, X, the τ -quantile will increase monotonically over τ ∈ (0, 1).Foundational work appears in Tokdar and Kadane (2012) using a Gaussian process (GP) with follow on work in Das and Ghosal (2017) using splines.Reich et al. (2011) developed a spatial joint QR model through spatially varying regression coefficients using Bernstein polynomials.Yang and Tokdar (2017) propose a novel parameterization that characterizes any collection of noncrossing quantile planes over arbitrarily shaped convex predictor domains.This parameterization was extended to spatial data by Chen and Tokdar (2021) through a copula process but a non-spatially varying quantile function results.Joint modeling imposes strong restrictions on the class of permissible specifications; models outside of this class may be preferred.Koenker and Xiao (2006) offered an initial version of a joint QAR(p) model.Illustrating with p = 1, they consider where U t is a sequence of IID standard uniform random variables.The θ functions, from [0, 1] → R, need to be estimated.Provided that the right side of expression (1) is monotone increasing in U t , the τ conditional quantile function of Y t given y t−1 increases in τ and is Q Yt (τ | y t−1 ) = θ 0 (τ ) + θ 1 (τ )y t−1 .
Our contribution is to reconsider the work of Koenker and Xiao (2006) in the context of Tokdar and Kadane (2012), providing flexible joint QAR modeling in a 2 The QAR(1) case

The support of the data
For a noncrossing QAR specification we need to restrict the support of the time series data, {y * t : t = 1, . . ., T }, to a bounded interval on the real line. 1We take this interval to be [0, 1] and implement this by making a transformation of the data, where m < min y * t and M > max y * t .In fact, m and M are chosen such that min y t is close to but above 0 and max y t is close to but below 1.This enables the most flexibility for the quantile function under our proposed QAR modeling and we offer an automatic selection approach below.
Two points are important to note.First, we can not take m = min y * t and M = max y * t .The data must be in the interior of the unit interval in order to enable distinct quantiles as τ varies across (0, 1).Second, choosing m and M is merely a device for working on the unit interval.There is no connection between these values and the potential practical support of the y * t 's.Imposing bounding on the support is unavoidable for a valid linear specification of Q Yt (τ | y t−1 ) of the form θ 0 (τ ) + θ 1 (τ )y t−1 because the only nonintersecting lines under unbounded support are parallel lines.
A convenient "automatic" strategy for selecting m and M is as follows.The idea recalls basic results from the theory of order statistics.If we have T independent observations from a uniform distribution on (m, M ), {y * t : t = 1, . . ., T }, then [E(Y * (1) ) − m]/(M − m) = 1/(T + 1) and [E(Y * (T ) ) − m]/(M − m) = T /(T + 1).So we can say y * (1) ≈ (mT + M )/(T + 1) and y * (T ) ≈ (m + T M )/(T + 1).This gives two equations in two unknowns to solve for m and M .We obtain T − 1 and M = T y * (T ) − y * (1) Of course the Y * t 's are not independent, they do not come from a distribution on a bounded interval, and marginally, we would not expect them to follow a uniform distribution on (m, M ).We only implement a simple automatic bounding strategy.

Specification for the two monotone curves
Specifically, both η 1 (•) and η 2 (•) again must be strictly monotone from [0, 1] → [0, 1].A convenient class to work with are cdf's for continuous random variables with support [0, 1].In fact, a rich class would arise as probabilistic mixtures of such cdf's, leading to the general form η(τ A convenient class of F 's to work with are the cdf's of the two parameter Kumaraswamy (1980) distribution (also known as the minimax distribution, Jones, 2009).Specifically, the probability density function (pdf) and cdf are where x ∈ [0, 1] and a, b > 0. The Kumaraswamy distributions are a family with behavior similar to the beta distribution.However, for our purposes, they are much simpler to use especially in the context of simulation since the cdf and quantile function can be expressed in closed form, i.e., The flexibility of the Kumaraswamy distributions is shown in Section S1 of the Supplementary Information (SI) employing different combinations of parameters (a, b).
To work with the mixture form for η(τ ), we investigated two mixture strategies.The first lets K be small but assumes the a's and b's are unknowns.The second lets K be larger but adopts a fixed set of a's and b's, in the spirit of basis function forms.Specifically, we consider K Kumaraswamy distributions with medians k/(K + 1), respectively.In the former, with K = 2 we have a total of five parameters (two a's, two b's, and a λ) while in the latter, with K = 6 again we have five parameters (five λ's).Increasing the number of "basis" components in the specification of the η's need not provide better model performance.From considerable simulation experience, model performance is very sensitive to the choice of parameters in the mixture components.So, in the sequel, we work with K = 1 or 2 (QAR1K1 and QAR1K2, hereafter) and fit the a's and b's.As for priors, with K = 1, we consider log a 1 , log b 1 ∼ N (0, σ 2 ab ) with σ ab = 3, which gives a weak prior on the log-scale.With K = 2, we consider ab ) with σ ab = 1.5.Restricting λ 1 to (0, 1/2) avoids identification issues, while σ ab is taken smaller than in the K = 1 case to penalize values of a's and b's too small or large.Values of a's and b's that are close to zero or very large can cause negligible numerical errors in the rootfinder to generate a numerical overflow in the likelihood (see Equations 7 and 8 below) and thus degeneracy.

Likelihood evaluation and model fitting
An important feature of a valid joint specification of Q Yt (τ | y t−1 ) for all τ ∈ (0, 1), following Tokdar and Kadane (2012), is that it uniquely defines the conditional response density given y t−1 ∈ [0, 1].Specifically, this density is given by where τ yt−1 (y t ) solves y t = y t−1 η 1 (τ ) + (1 − y t−1 )η 2 (τ ) in τ and is numerically approximated to arbitrary precision via a one-dimensional rootfinder.We implement the hybrid rootfinding algorithm combining the bisection method, the secant method, and inverse quadratic interpolation, called Brent's method (Brent, 1973).Consequently, given the data at t = 1, y 1 , we can write a valid log-likelihood score where u t = τ yt−1 (y t ), y ⊤ = (y 1 , . . ., y T ) are all of the observed data, Ω are the model parameters, and the η's are the derivatives of the η's.
We implement an adaptive block-Metropolis sampler algorithm (Haario et al., 2001) to obtain Markov chain Monte Carlo (MCMC) samples from the posterior distribution of the parameters and to summarize the posterior distribution of the conditional quantile function.Furthermore, with a posterior realization of the model parameters and a given value of y t−1 , we can use (7) with discretization, to obtain a posterior realization of the density function that is driving the joint quantiles.Averaging over these realizations provides the posterior mean of the density.

Model comparison and simulation study
Working within our parametric Bayesian framework, for any τ , posterior samples of the model parameters, {Ω * b : b = 1, . . ., B}, produce posterior samples of the conditional quantile function for Essentially, for each Y t (with associated y t−1 ) and any τ , we obtain the posterior distribution of Q Yt (τ | y t−1 ; Ω).We use these posterior distributions along with the dataset, y, to offer model assessment.Specifically, we propose two metrics.One, denoted by pv , is created by comparing averaged estimates of p t (τ ) where δ τ (u) = u(τ − 1(u < 0)) is the check loss function associated with the AL distribution.Full details are provided in Section S2 of the SI.
In Section S3 of the SI we also present the results of a brief simulation study where the goals were (i) to illustrate parameter recovery under fitting for several models, (ii) to investigate model flexibility, i.e., performance when the sampling model is not the same as the fitting model, and (iii) to consider the effect of sample size with regard to (i) and (ii).similar construction for an autoregressive process of order p as follows.Define where the functions η 1 , . . ., η p+1 : [0, 1] → [0, 1] are monotonically increasing and the weights π 1 , . . ., π p are such that π j ≥ 0 and j π j = 1.It is easy to see that such Q Yt (τ | y t−1 , . . ., y t−p ) is monotonically increasing in τ ∈ [0, 1] for every y t−1 , . . ., y t−p ∈ [0, 1].
In particular, for QAR(2), let τ, π ∈ [0, 1].Then, define where the three η functions are all strictly increasing, using forms as above.Rewriting the expression as both autoregressive coefficients belong to [−1, 1] and need not be increasing in τ .We fit this QAR(2) model to our real data in Section 6.3.In fact, we only attempt this with K = 1 mixture components (seven parameters) to keep the model simple.Further, the second autoregressive term results are not influential for our data.Also, we choose log a's and log b's to follow a N (0, 1.5 2 ) prior and π ∼ U (0, 1) as a noninformative prior for π.
4 Multivariate QAR(1) Often a collection of dependent times series is gathered over a common time window.For instance, our illustration below considers the dependent pairs {(y max t , y min t ) : t = 1, . . ., T }, the daily maximum and minimum temperature for day t at a site.In fact, the collection of time series might be spatially referenced (leading to a spatial copula model construction, as developed in the next section).What we have is the quantile analogue of usual multivariate AR for time series.Implementation using the class of joint QAR(1) models we have proposed has not appeared in the literature.Our interest is in the quantile function for each time series.We are asking about the amount of dependence between quantile levels regarding the marginal quantile functions.
Here, we illustrate with the bivariate case where we have two models each defined as in (1), introducing dependence in the two time series by making the associated U t 's dependent through T − 1 IID 2-dimensional Gaussian copulas.This specification captures the acknowledged dependence between the pair of time series.We postpone to Section 5 the details of modeling using copulas; in particular, that section develops the form of the general n-dimensional joint density.The only detail that we advance here is that the correlation matrix associated with the copulas contains 1's on the diagonal and ρ on the off-diagonal, where ρ ∼ U (−1, 1) measures the correlation between series.
Apart from introducing dependence through U max t and U min t , we could introduce dependence in the η's.For instance, using Kumaraswamy cdf's, under the K = 1 case, we consider the pairs log a max j and log a min j and the pairs log b max j and log b min j (j = 1, 2) to be bivariate normal.In our data we found little or no correlation between the parameters of the two time series, so in subsequent analyzes we will consider them independent.We do not pursue this case further here except to note the analogy with dependent responses in linear regression models.Introducing dependence through the U t 's is analogous to introducing dependence through the errors in the linear regression while introducing dependence through the η's is analogous to introducing dependence in the mean structure through shared parameters.
An example is presented in Section 6.4.Again, with K = 1, this yields four η's, i.e., four independent log a's and four independent log b's, each following a weak, say N (0, 3 2 ) prior, as well as the copula parameter.As a by-product, we show the induced bivariate conditional pdf (arising from the bivariate analogue of Equation 7) for (Y max t , Y min t ) with some choices for the y t−1 's.

Spatial QAR(1)
In the spatial setting, we consider spatial point-referenced time series data.Here, Y t (s) denotes the observation for time t = 1, . . ., T at location s ∈ D, where D ⊂ R 2 is the study region.We have a time series at each of the locations, {s 1 , . . ., s n }, say, the locations of the monitoring stations.The joint spatial QAR model is given by where the θ functions are quantile and spatially varying.Chen and Tokdar (2021) propose to model the spatial dependence of the realizations in a QR model using a spatial copula process.Generalizing it to our model, the vectors (U t (s 1 ), . . ., U t (s n )) ⊤ follow an independent copula distribution for every t.Supplementing Chen and Tokdar (2021), in ( 12) we introduce spatially varying coefficients rather than global coefficients.As a consequence, we have dependence in the time series realizations as well as spatially varying quantile functions.
We take the ϕ's to be fixed values, according to the spatial scale, because it is usually difficult to estimate them from the data and typically interest focuses on the σ 2 's, the spatial uncertainties (Banerjee et al., 2014).Specifically, we fix ϕ = 3/d max , with d max the maximum distance between any pair of spatial locations.Thus, the spatial GP's are only indexed by a mean and a variance parameter.We choose the priors a j , b j , log σ 2 aj , log σ 2 bj ∼ N (0, 3 2 ) (j = 1, 2).

The spatial copula process
A copula is a multivariate cdf for which the marginal distribution of each variable is U (0, 1).Copulas are used to model the dependence between random variables.Particularly, Sklar's theorem (Sklar, 1959) states that any multivariate joint pdf can be written in terms of univariate marginal pdf's and a copula which describes the dependence structure between the variables.Gaussian spatial copulas enable computational advantages, e.g., ease of parameter estimation and scalability with sample size.For a given correlation matrix R, the n-dimensional Gaussian copula function with parameter matrix R becomes where is the joint cumulative distribution function of a multivariate normal distribution with zero-mean vector and covariance matrix R.
According to Xue-Kun Song (2000), the associated copula density is With regard to the copula model for ( 12), we take the processes U t (s)'s to follow a Gaussian copula for each t, induced by a stationary spatial GP.In the spirit of Chen and Tokdar (2021), we define The process W t (s) captures spatial dependence while ϵ t (s) is independent pure error.
The parameter γ ∈ [0, 1] determines the proportion of spatial and independent variation.When γ = 1, the specification for Z t (s) is purely spatial.When γ = 0, we have an independent noise model.With this approach, the Gaussian copula density has correlation matrix R ≡ γR(ϕ) + (1 − γ)I n where R(ϕ) is the n × n correlation matrix induced by ρ(s, s ′ ; ϕ).To address the final copula piece of our model, we fix ϕ as above, and adopt γ ∼ U (0, 1) as a noninformative prior for γ.

Likelihood evaluation
We are interested in the likelihood under model ( 12) using ( 14) and ( 15).It is convenient to first obtain the joint distribution for . By Sklar's theorem, the joint conditional density of responses, Y, given initial time's data, y 1 , can be partitioned into a marginal part and a copula part, where the cdf F Yt(si) corresponds to the pdf f Yt(si) and c Φ is the Gaussian copula density in ( 14).As in Section 2.3, we evaluate f Yt(si) and F Yt(si) using: where Then, the log-likelihood score of the model parameters Ω can be expressed by . Finally, note that, for the calculation of the loglikelihood, the value of the u t (s i )'s must be solved for, so the number of rootfinders needed at each iteration of the MCMC is n(T − 1).As a result, likelihood evaluation is expensive, leading to long MCMC run times.

Spatial interpolation
The quantile Q Yt(s) (τ | y t−1 (s)) is a function of process realizations.Posterior samples for the hyperparameters are available from the model fitting.Posterior samples for the GP's are available, using posterior samples of the hyperparameters, through usual Bayesian kriging (Banerjee et al., 2014).This yields prediction of a j (s 0 ) and b j (s 0 ) (j = 1, 2) at a new s 0 ∈ D, enabling spatially varying quantile functions.Therefore, we can interpolate conditional quantiles to any desired location in the study region given any proposed or reference value for the previous day's temperature at that location.If we do this over a sufficiently spatially resolved grid, we can obtain the posterior mean at each point and show the posterior τ conditional quantile surface for the given day.We bring in daily minimum temperatures for the bivariate QAR(1) case.The data is provided by the State Meteorological Agency (AEMET, in its Spanish acronym).Castillo-Mateo et al. ( 2022) provide exploratory analysis and spatial hierarchical modeling for this dataset.We analyze responses for 2015, an interesting year because the summer was especially hot in Europe (Dong et al., 2016).There were numerous locations with record-breaking temperatures in July 2015 and the heat was maintained over time.The monthly average value of temperatures was a record in July 2015 for 6 of the 18 locations and in the entire region it was among the 10 hottest monthly averages.We restrict analysis to observations from May, June, July, August, and September (denoted as MJJAS), i.e., the hottest months of the year, resulting in T = 153 days.The location of the 18 observatories is shown in Figure 1 and their time series in Figure S4 of the SI.
We begin with a model comparison using QAR(1) and QAR(2) models for all locations.Then, we analyze two illustrative locations within the region, Pamplona and Zaragoza, separately.Subsequently, we implement the bivariate QAR(1) model to the daily maximum and minimum temperature series in Zaragoza.Finally, we implement the general model for spatial QAR(1) with all the locations.Before model fitting, we scale each of the temperature time series to (0, 1) using the transformation in (3) with m and M in (4).We adopt site-level values for m and M .

The QAR(1) case
Table 1 shows, averaged across locations, the metrics of model adequacy p2 and model comparison R1 defined in Section S2 from the SI for the QAR1K1 and QAR1K2 models, and the model from Koenker and Xiao (2006) fitted under our Bayesian framework using the density in (7).Table S6 in the SI shows the metrics for each location.For this latter model, denoted as KX2006, we also consider a location parameter µ in the intercept, i.e., θ 0 (τ ) = µ+σΦ −1 (τ ) and θ 1 (τ ) = min{γ 0 +γ 1 τ, 1} for γ 0 ∈ (0, 1) and γ 1 > 0. With KX2006 we work on the original scale of the data since they are all positive.We choose the priors µ ∼ N (0, 10 2 ), log σ, log γ 1 ∼ N (0, 3 2 ), and γ 0 ∼ U (0, 1).The R1 does not discriminate much between the proposed models, i.e., the autoregressive term explains much more variability than the difference in specification between the models.However, our proposed models have a slightly higher performance, 0.365, than the KX2006 model, around 0.34.Also, the p2 directly measures how well the quantiles are captured, and its discriminative capacity is much higher.While QAR1K1 obtains a value of 0.633, adding a second component to the mixing improves this measure to 0.402.For its part, the KX2006 model obtains the worst value, 0.683, indicating an overall poorer fitting of the quantiles.
Figure 2 shows the posterior mean of the functions θ 0 and θ 1 in Pamplona and Zaragoza for the models QAR1K1 (dashed) and QAR1K2 (solid).Note that we could recover the intercepts on the original scale as θ * 0 (τ ) = m(1 − η 1 (τ )) + M η 2 (τ ) and the autoregressive coefficients remain invariant.Further, θ 1 is not monotonic; this aspect of temperature dependence with respect to the previous day's temperature was also observed by Castillo-Mateo et al. (2023).It cannot be reproduced by KX2006.In Pamplona, the QAR1K2 model (the best) estimates a lower autoregressive coefficient than the QAR1K1 for τ ∈ (0.1, 0.7).In Zaragoza, similar curves appear for the two values of K, as shown by p2 and R1 .
Figure 3 shows the posterior mean of the conditional quantile functions Q Yt (τ | y) for three situations where y is the empirical τ marginal quantile for τ = 0.1, 0.5, 0.9; the legend shows the values that are conditioned on both the original scale and the (0, 1) scale.The smallest values of θ 1 are in extreme τ 's, this means that the previous day's temperature is less influential for the extreme quantiles.In fact, the conditional quantiles in Figure 3 overlap for τ 's near 0 or near 1.
Figure 4 shows the posterior mean of the conditional density function in (7) under the same conditions as Figure 3. Pamplona presents different shapes in f Yt (y t | y) for different values of y.The distribution is asymmetrical with positive skewness if we condition on a small value for the previous day's temperature, and negative skewness if we condition on a big value.A general pattern is common in the region, the conditional distribution conditional on the 0.9 marginal quantile is more concentrated than those conditional on the 0.1 quantile.Figures S5, S6, S7, and S8 in the SI present the plots for the 18 locations.

The QAR(2) case
Table 1 uses the criteria p2 and R1 for the QAR(2) model with K = 1 (QAR2K1).The previous subsection showed that including a first lag improved the performance of the model with respect to an empirical null model.However, including a second lag does not increase the value of R1 with respect to a QAR(1) model.On the other hand, Fig. 3 Posterior mean of the quantile function Q Yt (τ | y) vs. τ for QAR1K1 (dashed) and QAR1K2 (solid).Here, y is the empirical τ marginal quantile for τ = 0.1 (blue), 0.5 (black), 0.9 (red).Pamplona (left) and Zaragoza (right), MJJAS, 2015.
the measure of p2 is somewhat better for QAR2K1 than for QAR1K1 but it is still inferior to the QAR1K2 case.Since QAR2K1 does not improve performance, and, as Here, y is the empirical τ marginal quantile for τ = 0.1 (blue), 0.5 (black), 0.9 (red).Pamplona (left) and Zaragoza (right), MJJAS, 2015.we will see below, there is no evidence that the term θ 2 (τ ) is different from zero for any τ across most locations, there seems to be no value in exploring a QAR(2) model with K = 2.
Figure 5 shows the θ functions in the QAR2K1 model for Pamplona and Zaragoza (see Figure S9 in the SI for all locations).The θ 0 and θ 1 functions have a shape very similar to the QAR1K1 case.The θ 2 functions have values that are essentially centered at zero in most locations, giving more evidence that it is not necessary to introduce a lag of order 2 in the model.However, there are four locations with a coefficient slightly away from zero; Buñuel and La Sotonera have a value of θ 2 (τ ) close to 0.2 for nonextreme τ 's while Huesca and La Puebla de Híjar have similar behavior with values around 0.1.

Multivariate QAR(1)
Here, we fit the multivariate QAR(1) model (MQAR1K1) to the daily maximum and minimum temperature series at Zaragoza, {(y max t , y min t ) : t = 1, . . ., T }.The same analyses were developed for Pamplona and Daroca, but with different conclusions.Figure 6 shows the θ functions for the y max t (red) and y min t (blue) series.We see different patterns for θ max 1 and θ min 1 ; y max t shows high autocorrelation for high quantiles while y min t has less persistence for those quantiles.For Zaragoza, the posterior mean of ρ is 0.32 with 95% credible interval (0.17, 0.45), indicating the need to include dependence in the quantile levels of both series.For Pamplona, the posterior mean of ρ is 0.06 with 95% credible interval (−0.11, 0.23).
Here, independent models for the conditional quantiles could be adopted.An explanation is the frequent appearance of fresh wind from the northwest during the night in Pamplona, resulting from proximity to the Cantabrian Sea.depending on the previous day's temperatures.The conditional posterior distribution is not symmetric, with a different mean vector depending on the conditioning temperatures; the variability of the distribution is smaller when it is conditioned on high quantiles.

Spatial QAR(1)
The spatial QAR model is fitted to the series of MJJAS in 18 locations in Aragón for the year 2015.The posterior mean of γ, the proportion of spatial dependence in (15), is 0.955 with 95% credible interval (0.934, 0.973) indicating very strong spatial ) conditioned on (y max , y min ) for MQAR1K1.Here, (y max , y min ) is equal to the respective empirical marginal quantiles for τ = 0.5 (above), 0.9 (below) for the maximum; and τ = 0.5 (left), 0.9 (right) for the minimum.Zaragoza, MJJAS, 2015.dependence in the quantile levels of the temperature series.Figure S12 in the SI provides maps of the posterior mean surface of the model GP's.We notice that b 1 (s) and b 2 (s) show approximately opposite spatial behavior since b 1 (s) has the highest values where b 2 (s) has the lowest, in the central and southeastern areas.Figure S13 of the SI shows boxplots of the posterior distribution of the GP's at each observed location; locations are sorted by elevation.The results suggest that the GP of a 2 (s) might be not necessary since the boxplots in the 18 locations have very similar ranges.The spatial variability of a 1 (s) is higher and, although it is not related to the elevation, it could be related to the distance to the sea.
The posterior distribution of θ 1 (τ ; s), which captures the autoregressive structure, is summarized using the same type of plots.Figure S14 of the SI shows boxplots presenting the posterior distribution of θ 1 (τ ; s) at the observed locations while Figure 8 the maps of the posterior mean surface of θ 1 (τ ; s), both for τ = 0.05, 0.50, 0.95.The spatial GP's in the parameters of the Kumaraswamy distribution allow the model to fit different spatial patterns in each τ .The results show that the posterior mean of θ 1 (τ ; s) is higher in the central quantiles.The spatial pattern of θ 1 (τ ; s) is not symmetric around τ = 0.5 and, e.g., values of θ 1 (0.95; s) in the Pyrenees and northwestern areas are smaller than θ 1 (0.05; s) in the same areas.Although θ 1 (τ ; s) tends to be lower in locations with higher elevation, its spatial pattern cannot be explained by elevation alone.Consequently, the spatial GP's cannot be replaced with an elevation fixed effect.
The spatial joint model can also be used to estimate parameters related to the conditional distribution, e.g., conditional quantiles at unobserved locations.As a brief example, Figure S15 of the SI shows this through maps of the posterior mean of Q Yt(s) (τ | y) for τ = 0.05, 0.50, 0.95, and y = 0.05, 0.50, 0.95.If it were desired to obtain the quantiles on the original scale of the data rather than the scale (0, 1), we could consider a kriging of m(s) and M (s).With the same kriging procedure we could condition on values y(s)'s relative to a certain empirical marginal quantile for each location.
The spatial modeling here is primarily illustrative.For instance, the assumption of asymptotic tail independence, imposed by the Gaussian copula, may not be suitable.Examination of alternative copulas is beyond the scope of this work.

Summary and future work
We have presented consequentially expanded modeling for joint (non-quantile crossing) QAR.In particular, we have characterized the QAR(1) setting in a way that allows for a more flexible autocorrelation structure than the one in the seminal paper by Koenker and Xiao (2006).We have extended this to the QAR(p) case.We have offered a novel multiple time series version using a Gaussian copula.We have elaborated a spatial version, using a GP copula based upon a GP in conjunction with four additional GP's.This model enables spatially varying quantile functions.Our modeling is entirely parametric through the use of the Kumaraswamy distributions.A software implementation of our methods is available as the R-package "QAR" through GitHub: https://github.com/JorgeCastilloMateo/QAR.
We have illustrated the above contributions through time series of daily temperatures from sites in Aragón, Spain.The joint QAR model, with greater flexibility in the modeling of the θ functions, allows us to capture autoregression structure in daily temperature data, which is not strictly increasing in τ , but decreasing in both tails.
A critical challenge in employing this work is model fitting.We can make specifications as rich as needed through the use of probabilistic mixtures of Kumaraswamy cdf's.However, it is well-known that model fitting employing MCMC with mixture specifications is often poorly identified.This issue is compounded in our case by the fact that calculation of the likelihood requires constant use of a one-dimensional rootfinder.Ongoing work is attempting to address these computational difficulties.
We digress briefly to note a simple approximation strategy to incorporate regressors, e.g., seasonality, into our joint QAR approach.Suppose we introduce a regression structure, µ t into the QAR(1) and estimate by μt , creating residuals r t = Y t − μt .Then, we could apply the above methodology to obtain the QAR(1) for r t .Our strategy for selecting m and M can be applied to residuals.More precisely, let r t = θ (0) (U t ) + θ (1) (U t )r t−1 .This would yield the conditional quantile function, Q rt (τ | r t−1 ) = θ (0) (τ ) + θ (1) (τ )r t−1 .Solving for the quantile function for Y t we obtain This approximation can be criticized for two reasons: (i) we are creating μt as if we were fitting a usual AR( 1) and (ii) the resulting quantiles are not coherent since μt is a function of {Y t : t = 1, . . ., T }.The QAR( 1) is not defined until the end of the observation window.Coherent implementation of covariates in the QAR setting, seeking to bridge our modeling with the work of Yang and Tokdar ( 2017) is in progress.Sections 4 and 5 could be combined to build a bivariate spatial QAR model for daily max and daily min temperature.Another challenge for the multivariate and spatial modeling would be to consider alternative copula choices, e.g., t-copulas in order to allow tail dependence for extreme quantiles.improved whale optimization algorithm for probabilistic short-term wind speed prediction.Renewable Energy 197: 668-682.https://doi.org/10.1016/j.renene.2022.07.123 .

S2 Model adequacy and comparison
Working within our parametric Bayesian framework, for any τ , posterior samples of the model parameters, {Ω * b : b = 1, . . ., B}, produce posterior samples of the conditional quantile function for Y t , Q Yt (τ | y t−1 ; Ω * b ).Essentially, for each Y t (with associated y t−1 ) and any τ , we obtain the posterior distribution of Q Yt (τ | y t−1 ; Ω).We use these posterior distributions along with the dataset, y, to offer model assessment.
We propose two novel approaches.First, for any y, consider 1(y < Q Yt (τ | y t−1 ; Ω)) where 1 denotes the indicator function.Then, let p t (τ i.e., the posterior probability that Q Yt (τ | y t−1 ; Ω) exceeds y t .Suppose we compute p(τ ) ≡ T t=2 p t (τ )/(T − 1).We note that for any τ and Y regardless of its distribution, provides a standardized deviation form as a dimensionless measure of well the quantile function under the model is capturing conditional quantiles for the given time series.We propose this as a (global ) measure of model accuracy.With a minimum value of zero, a smaller pv indicates better accuracy of the model.We would approximate the integral by discretizing τ , in particular, we consider τ ∈ {0.01, 0.02, . . ., 0.99}.As a second measure, we turn to the check loss function, usually employed as an optimality function to obtain the τ empirical quantile (Koenker and Bassett, 1978).Here, we adopt δ τ (u) = u(τ − 1(u < 0)), the check loss function associated with the AL distribution.Again, from the posterior distribution of Q Yt (τ | y t−1 ; Ω), for any Y t (with associated y t−1 ) and τ , we can obtain ∆ t (τ As above suppose we compute ∆(τ ) ≡ T t=2 ∆ t (τ )/(T − 1).Then, for a given τ , ∆(τ ) provides an average discrepancy for the τ quantile function.The smaller the value, the better the model performance.Then, we propose to weight ∆(τ ), to provide a global measure of model performance.We propose this as a relative measure of model performance in making model comparison.The weighting function, ω(τ ), compensates for the variation in mean of ∆(τ ) across τ .Again, we would approximate the integral by discretizing τ .
For the weight function, we consider This choice leads to a measure that is closely related to the R 1 (τ ) metric by Koenker and Machado (1999).The R 1 (τ ) measure is essentially, 1 − ω(τ | y)∆(τ ).This measure is viewed as an analogue of R 2 for the classical residual sum of squares, i.e., the check loss function for quantiles replaces the least-squares loss function and the τ empirical marginal quantile Q emp Y (τ ) replaces the sample mean.With a maximum value of 1, the best model performance is reached at this maximum.Then, provides a dimensionless global measure of model performance which can be used for model comparison.

S3 Simulation study
To illustrate parameter recovery under the modeling in Section 2 of the Main Manuscript, we perform a simulation study where we consider QAR(1) models with K = 1 and K = 2 (QAR1K1 and QAR1K2) Kumaraswamy cdf's in the probabilistic mixture from Section 2.2.1, and scenarios with different combinations of parameter values.For each scenario we simulate B = 100 datasets of length T = 150 and T = 500 time points, using the model in ( 5) from the Main Manuscript with the η's again specified as probabilistic mixtures.For each simulated dataset, a QAR(1) model with K equal to the value used in the data generating process is fitted.First, 90% credible intervals are computed for θ 0 (τ ) and θ 1 (τ ) in a grid of values of τ .Then, the coverage (CVG) of each value of each function is computed as the proportion of the B computed credible intervals that contain the corresponding true values.
The results are satisfactory, with the average of the CVG's close to the nominal level of 0.90, varying between 0.86 and 0.92 in all the scenarios for both T = 150 and 500.Not only the averages but also the individual CVG's for each τ , even for extreme quantiles (τ = 0.01, 0.99), are close to 0.90 (see Tables S2 and S4).It is noteworthy that Scenarios SC3, SC4 and SC6 could correspond to a usual correlation structure in climate and environmental data, with a strong positive dependence in the central values of τ that weakens at the extremes.
To study the flexibility of the QAR1K1 model, Table S5 summarizes the metrics described in Section S2, obtained from fitting the QAR1K1 and QAR1K2 models to the previous data series generated with QAR1K2 models.The values of R1 and p2 are the metrics averaged across the B = 100 simulations.QAR1K1 models are quite flexible and their metrics are only slightly poorer than QAR1K2 models.Note that series generated under Scenario SC5 are independent and consequently R1 is expected to be zero.S2: The 90% CVG of θ 0 (τ ) and θ 1 (τ ) (τ = 0.01, 0.50, 0.99) in QAR1K1 models fitted to Scenarios SC1-SC4.

Figure 7
shows level curves of the posterior conditional joint density of the vector (Y max t , Y min t ) given the previous day's max and min temperatures, in Zaragoza (see Figures S10 and S11 in the SI for Pamplona and Daroca).The conditioning values are empirical marginal quantiles of Y max t and Y min t .The first row conditions on the quantile τ = 0.5 (30.6 • C) and the second row on the quantile τ = 0.9 (37.0 • C) of Y max t , and the same quantiles of Y min t , for the first (17.2• C) and second (21.8 • C) columns.The different patterns observed in the plots reveal a different relation between Y max t and Y min t

Fig. 7
Fig. 7 Posterior mean of the density function of (Y max t , Y min t

Figure S2 :
Figures S1 and S2 show the pdf and the cdf of the Kumaraswamy distribution for different combinations of parameters (a, b).

Figure S4 :
Figure S4: Daily maximum temperature time series of the 18 locations in MJJAS, 2015.

Figure S11 :
Figure S11: Posterior mean of the density function of (Y max t

Figure S13 :
Figure S13: Boxplots summarizing the posterior distribution of a 1 (s), b 1 (s), a 2 (s), and b 2 (s) at the observed locations sorted by elevation.

Table S3 :
Values of the parameters of the QAR1K2 models used to generate data under Scenarios SC5-SC7.(First subscript indicates the mixture function η j and second subscript indicates the k-th component of that mixture to which the parameter corresponds.)

Table S5 :
Adequacy and comparison metrics for QAR1K1 and QAR1K2 models in Scenarios SC5-SC7.(Values are averaged across the B = 100 simulations.)

Table S6 :
Adequacy and comparison metrics for QAR1K1, QAR1K2, QAR2K1, and KX2006 models for the 18 locations and averaged across locations.