Goodness-of-fit testing for the marginal distribution of regime-switching models with an application to electricity spot prices
- First Online:
- Received:
- Accepted:
DOI: 10.1007/s10182-012-0202-9
- Cite this article as:
- Janczura, J. & Weron, R. AStA Adv Stat Anal (2013) 97: 239. doi:10.1007/s10182-012-0202-9
Abstract
This paper complements a recently published study (Janczura and Weron in AStA-Adv Stat Anal 96(3):385–407, 2012) on efficient estimation of Markov regime-switching models. Here, we propose a new goodness-of-fit testing scheme for the marginal distribution of such models. We consider models with an observable (like threshold autoregressions) as well as a latent state process (like Markov regime-switching). The test is based on the Kolmogorov–Smirnov supremum-distance statistic and the concept of the weighted empirical distribution function. The motivation for this research comes from a recent stream of literature in energy economics concerning electricity spot price models. While the existence of distinct regimes in such data is generally unquestionable (due to the supply stack structure), the actual goodness-of-fit of the models requires statistical validation. We illustrate the proposed scheme by testing whether commonly used Markov regime-switching models fit deseasonalized electricity prices from the NEPOOL (US) day-ahead market.
Keywords
Regime-switchingEnergy economicsGoodness-of-fitWeighted empirical distribution functionKolmogorov–Smirnov test1 Introduction
Regime-switching models have attracted a lot of attention in the recent years. A flexible specification allowing for abrupt changes in model dynamics has led to its popularity not only in econometrics (Choi 2009; Hamilton 1996b; Lux and Morales-Arias 2010), but also in other fields as diverse as traffic modeling (Cetin and Comert 2006), population dynamics (Luo and Mao 2007), river flow analysis (Vasas et al. 2007) or earthquake counts (Bulla and Berzel 2008). This paper is motivated by yet another stream of literature: electricity spot price models in energy economics (Bierbrauer et al. 2007; De Jong 2006; Erlwein et al. 2010; Huisman and de Jong 2003; Janczura and Weron 2010, 2012; Karakatsani and Bunn 2008, 2010; Mari 2008; Misiorek et al. 2006; Weron 2009). Regime-switching models have seen extensive use in this area due to their relative parsimony (a prerequisite in derivatives pricing) and the ability to capture the unique characteristics of electricity prices (in particular, the spiky and non-linear price behavior). While the existence of distinct regimes in electricity prices is generally unquestionable (being a consequence of the non-linear, heterogeneous supply stack structure in the power markets, see e.g., Eydeland and Wolyniec 2012; Weron 2006), the actual goodness-of-fit of the models requires statistical validation.
However, recent work concerning the statistical fit of regime-switching models has been mainly devoted to testing parameter stability versus the regime-switching hypothesis. Several tests have been constructed for the verification of the number of regimes. Most of them exploit the likelihood ratio technique (Cho and White 2007; Garcia 1998), but there are also approaches related to recurrence times (Sen and Hsieh 2009), likelihood criteria (Celeux and Durand 2008) or the information matrix (Hu and Shin 2008). Specification tests to detect autocorrelation and ARCH effects were proposed by (Hamilton (1996a), based on the score function technique) and more recently by (Smith (2008), utilizing the Rosenblatt transformation; see also Sect. 3.3). Smith found that the performance of the Ljung–Box test improved when used on the normally distributed Rosenblatt transformation. However, the considered Markov regime-switching models were relatively simple and had two states differing only in mean. Interestingly, the Rosenblatt transformation was used earlier for evaluating density forecasts of regime-switching (Berkowitz 2001; Diebold et al. 1998; Haas et al. 2004) and stochastic volatility models (Kim et al. 1998), typically in a risk management context.
On the other hand, to our best knowledge, procedures for goodness-of-fit testing of the marginal distribution of regime-switching models have not been derived to date (with the exception of Janczura and Weron 2009, where an ewedf-type test was introduced in the context of electricity spot price models, see Sect. 3.2.1 for details). With this paper we want to fill the gap and propose empirical distribution function (edf) based testing procedures built on the Kolmogorov–Smirnov test, that are dedicated to regime-switching models with observable as well as latent state processes. In contrast to the approaches based on the Rosenblatt transformation, the techniques proposed in this paper allow for testing the fit in the individual regimes as well as of the whole model. This can be advantageous in many situations as we additionally obtain information on which regimes are correctly and which are incorrectly specified. The derivation of the tests is not straightforward and in the case of a latent state process requires an application of the concept of the weighted empirical distribution function (wedf). Finally, we should also note, that the term “marginal distribution” does not mean that the proposed tests do not account for the dynamic regime structure. On the contrary, the dynamical structure is considered when constructing residuals used in the testing procedures.
The paper is structured as follows: in Sect. 2 we describe the structure of the analyzed regime-switching models and briefly explain the estimation process (for details we refer to an article recently published in AStA; Janczura and Weron 2012). In Sect. 3, we introduce goodness-of-fit testing procedures appropriate for regime-switching models both with observable and latent state processes. Next, in Sect. 4 we provide a simulation study and check the performance of the proposed techniques. Since the motivation for this paper comes from the energy economics literature, in Sect. 5 we show how the presented testing procedure can be applied to verify the fit of Markov regime-switching models to electricity spot prices. Finally, in Sect. 6 we conclude and outline future work.
2 Regime-switching models
2.1 Model definition
2.2 Estimation
Step 1 Denote the observation vector by \(\mathbf x_T =(x_1,x_2,\ldots ,x_T)\). For a parameter vector \(\theta ^{(n)}\) compute the conditional probabilities \(P(R_t = i|\mathbf x_T ;\theta ^{(n)})\)—the so called ‘smoothed inferences’—for the process being in regime \(i\) at time \(t\).
- Step 2 Calculate new and more exact maximum likelihood estimates \(\theta ^{(n+1)}\) using the log-likelihood function, weighted with the smoothed inferences from Step 1, i.e.,where \(f_i(x_t|\mathbf x_{t-1} ;\theta ^{(n+1)})\) is the conditional density of the \(i\)th regime, and update the transition probabilities:$$\begin{aligned} \log \left[L(\theta ^{(n+1)})\right]=\sum _{i=1}^2\sum _{t=1}^T P(R_t = i|\mathbf x_T ;\theta ^{(n)})\log \left[f_i(x_t|\mathbf x_{t-1} ;\theta ^{(n+1)})\right], \end{aligned}$$$$\begin{aligned} p^{(n+1)}_{ij}= \frac{\sum _{t=2}^T P(R_t =j, R_{t-1}=i|\mathbf x _{T};\theta ^{(n)})}{\sum _{t=2}^{T}P(R_{t-1}=i|\mathbf x _{T};\theta ^{(n)})}. \end{aligned}$$
3 Goodness-of-fit testing
In this section, we introduce a goodness-of-fit testing technique, that can be applied to evaluate the fit of regime-switching models. It is based on the Kolmogorov–Smirnov (K-S) goodness-of-fit test and verifies whether the null hypothesis \(H_0\) that observations come from the distribution implied by the model specification cannot be rejected. The procedure can be easily adapted to other empirical distribution function (edf) type tests, like the Anderson–Darling test.
3.1 Testing in case of an observable state process
3.1.1 Specification I
Further, observe that transformation \(h(X_{t+k,1},X_{t,1},k)\) is based on subtracting the conditional mean from \(X_{t+k,1}\) and standardizing it with the conditional variance. Indeed, \((1-\beta )^k X_{t,1} + \alpha \frac{1-(1-\beta )^k}{\beta }\) is the conditional expected value of \(X_{t+k,1}\) given \((X_{1,1},X_{2,1},\ldots ,X_{t,1})\) and \(\sigma ^2\frac{1-(1-\beta )^{2k}}{1-(1-\beta )^2}\) is the respective conditional variance.
The goodness-of-fit of the marginal distribution of the individual regimes can be formally tested. For the mean-reverting regime, \(F\) is the standard Gaussian cdf and \((y_1,y_2,\ldots ,y_{n_1})\) is the subsample of the standardized residuals obtained by applying transformation (7), while for the second regime, \(F\) is the model-specified cdf (i.e., \(F^2\)) and \((y_1,y_2,\ldots ,y_{n_2})\) is the subsample of respective observations. Observe that the ‘whole model’ goodness-of-fit can be also verified, using the fact that for \(X\sim F^2\) we have that \(Y=(F^1)^{-1}[ F^2 (X) ]\) is \(F^1\)-distributed. Indeed, a sample \((y_1^{1},y_2^{1},\ldots ,y_{n_1}^{1},y_1^{2},y_2^{2},\ldots ,y_{n_2}^{2})\), where \(y_t^{1}\)s are the standardized residuals of the mean-reverting regime, while \(y_t^{2}\)’s are the transformed variables corresponding to the second regime, i.e., \(y_t^{2}=(F^1)^{-1}[ F^2 (x_{t,2}) ]\), is i.i.d. \(N(0,1)\)-distributed and, hence, the testing procedure is applicable.
3.1.2 Specification II
The \(H_0\) hypothesis now states that the sample \((x_1,x_2,\ldots ,x_T)\) is driven by a regime-switching model defined by Eq. (6) with \(R_t\in \{1,2\}\). Similarly, as in the independent regimes case, the testing procedure is based on extracting the residuals of the mean-reverting process. Indeed, observe that under the \(H_0\) hypothesis the transformation \(h(x_t,x_{t-1},1)\), defined in (7), with parameters \(\alpha _{R_t}, \beta _{R_t}\) and \(\sigma _{R_t} \) corresponding to the current value of the state process \(R_t\), yields an i.i.d. \(N(0,1)\) distributed sample. Thus, the Kolmogorov–Smirnov test can be applied. The test statistic \(d_n\), see (9), is calculated with the standard Gaussian cdf and the sample \((y_1,y_2,\ldots ,y_T)\) of the standardized residuals, i.e., \(y_t=h(x_t,x_{t-1},1)\).
3.1.3 Critical values
Note, that the described above testing procedure is valid only if the parameters of the hypothesized distribution are known. Unfortunately in typical applications the parameters have to be estimated beforehand. If this is the case, then the critical values for the test must be reduced (Čižek et al. 2011). In other words, if the value of the test statistics \(d_n\) is \(d\), then the \(p\) value is overestimated by \(P(d_n \ge d)\). Hence, if this probability is small, then the \(p\) value will be even smaller and the hypothesis will be rejected. However, if it is large then we have to obtain a more accurate estimate of the \(p\) value.
To cope with this problem, Ross (2002) recommends using Monte Carlo simulations. In our case the procedure reduces to the following steps. First, the parameter vector \(\hat{\theta }\) is estimated from the dataset and the test statistic \(d_n\) is calculated according to formula (9). Next, \(\hat{\theta }\) is used as a parameter vector for \(N\) simulated samples from the assumed model. For each sample the new parameter vector \(\hat{\theta }_i\) is estimated and the new test statistic \(d_n^i\) is calculated using formula (9). Finally, the \(p\) value is obtained as the proportion of simulated samples with the test statistic values higher or equal to \(d_n\), i.e., \(\textit{p} { \text{ value}}=\frac{1}{N}\#\{i: d_n^i\ge d_n\}\).
3.2 Testing in case of a latent state process
3.2.1 The ewedf approach
Now, assume that the sample \((x_1,x_2,\ldots ,x_T)\) is driven by an MRS model. The regimes are not directly observable and, hence, the standard edf approach can be used only if an identification of the state process is performed first. Recall that, as a result of the estimation procedure described in Sect. 2.2, the so called ‘smoothed inferences’ about the state process are derived. The smoothed inferences are the probabilities that the \(t\)th observation comes from a certain regime given the whole available information \(P(R_t=i|x_1,x_2,\ldots ,x_T)\). Hence, a natural choice is to relate each observation with the most probable regime by letting \(R_t=i\) if \(P(R_t=i|x_1,x_2,\ldots ,x_T)>0.5\). Then, the testing procedure described in Sect. 3.1 is applicable. However, we have to mention, that the hypothesis \(H_0\) now states that \((x_1,x_2,\ldots ,x_T)\) is driven by a regime-switching model with known state process values. We call this approach ‘ewedf’, which stands for ‘equally-weighted empirical distribution function’. It was introduced by Janczura and Weron (2009) in the context of electricity spot price MRS models.
3.2.2 The weighted empirical distribution function (wedf)
In the standard goodness-of-fit testing approach based on the edf each observation is taken into account with weight \(\frac{1}{n}\) (i.e., inversely proportional to the size of the sample). However, in MRS models the state process is latent. The estimation procedure (the EM algorithm) only yields the probabilities that a certain observation comes from a given regime. Moreover, in the resulting marginal distribution of the MRS model each observation is, in fact, weighted with the corresponding probability. Therefore, a similar approach should be used in the testing procedure.
However, to our best knowledge, none of the applications of wedf is related to goodness-of-fit testing of Markov regime-switching models. Here, we use the wedf concept to deal with the case when observations cannot be unambiguously classified to one of the regimes and, hence, a natural choice of weights of wedf seems to be \(w_t=P(R_t=i|x_1,x_2,\ldots ,x_T)=E(\mathbb{I }_{\{R_t=i\}}|x_1,x_2,\ldots ,x_T)\) for the \(i\)th regime observations.
3.2.3 The wedf approach for specification II
Lemma 1
If \(H_0\) is true, then \(F_n\) given by (11) is an unbiased, consistent estimator of the distribution of the residuals (in this case Gaussian).
Note, that proofs of all lemmas and theorems formulated in this section can be found in the Appendix.
The following theorem yields a version of the K-S test applicable to parameter-switching MRS model (6). Note, that if the state process was observable, it would boil down to the standard K-S test (Lehmann and Romano 2005, p. 584).
Theorem 1
Lemma 2
If \(H_0\) is true, then \(F^i_n(x)\) given by (15) is an unbiased estimator of \(F^i(x)\). Moreover, it is consistent if \(\forall _{i,j=1,2}\)\(p_{ij}<1\).
An analogue of Theorem 1 can be derived.
Theorem 2
3.2.4 The wedf approach for specification I
Now, to test the fit of the mean-reverting regime, it is enough to calculate \(d^i_n\) according to formula (18) with the standard Gaussian cdf and \(y_t^1= g(x_{t},\mathbf{{x}}_{t-1})\). Observe, that the observations from the second regime are i.i.d. by definition, so the testing procedure is straightforward with \(F^2\) cdf and sample \((x_1,x_2,\ldots ,x_{T})\). Moreover, the ‘whole model’ goodness-of-fit can be also verified. Theorem 1 is directly applicable, if the distributions of the samples corresponding to both regimes are the same \(F=F^1=F^2\). Observe that, even if \(F^1\ne F^2\), the test still can be applied using the fact that for \(X\sim F^2\) we have that \(Y=(F^1)^{-1}[ F^2 (X) ]\) is \(F^1\)-distributed. The test statistic \(d_n\) is calculated as in (14) with \(F^1\) cdf (here Gaussian) and the sample \((y_1^{1},y_2^{1},\ldots ,y_{T-1}^{1},y_1^{2},y_2^{2},\ldots ,y_{T}^{2})\), where \((y_1^{1},y_2^{1},\ldots ,y_{T-1}^{1})\) are the transformed variables of the mean-reverting regime, i.e., \(y_t^{1}= g(x_{t,1},\mathbf{{x}}_{t-1})\), while \((y_1^{2},y_2^{2},\ldots ,y_{T}^{2})\) are the variables corresponding to the second regime, i.e., \(y_t^{2}=(F^1)^{-1}[ F^2 (x_t) ]\).
Note, that as in the case of an observable state process, in the wedf approach we face the problem of estimating values that are later used to compute the test statistic. Again, this problem can be circumvented with the help of Monte Carlo simulations. The \(p\) values can be computed as the proportion of simulated MRS model trajectories with the test statistic \(d_n\), see formulas (14) and (18), higher or equal to the value of \(d_n\) obtained from the dataset.
3.3 The Rosenblatt transformation
The Rosenblatt transformation is a very useful and general tool, however, it can be used to test the goodness-of-fit of the whole model only. In contrast, the ewedf and the wedf approaches allow for testing the fit in the individual regimes. This can be advantageous in many situations as we additionally obtain the information which regimes are correctly and which are incorrectly specified. Moreover, the ewedf and the wedf approaches yield estimators of the regime and model cdfs, providing a readily available tool for further testing and model building. On the other hand, in case of the Rosenblatt transformation an empirical distribution function can only be constructed for the transformed variables making it hard to be interpreted.
4 Simulations
Parameters of four 2-regime MRS models analyzed in the simulation study of Sect. 4
Parameters | Probabilities | |||||||
---|---|---|---|---|---|---|---|---|
\(\alpha _1\) | \(\beta _1\) | \(\sigma _1^2\) | \(\alpha _2\) | \(\beta _2\) | \(\sigma _2^2\) | \(p_{11}\) | \(p_{22}\) | |
Sim #1 | 10.00 | 0.80 | 10.0000 | 4.00 | – | 0.5000 | 0.90 | 0.20 |
Sim #2 | 1.00 | 0.80 | 1.0000 | 3.00 | 0.40 | 0.5000 | 0.60 | 0.50 |
Sim #3 | 0.58 | 0.17 | 0.0048 | 1.28 | – | 0.0034 | 0.97 | 0.88 |
Sim #4 | 0.63 | 0.81 | 0.0038 | 0.95 | 0.73 | 0.0200 | 0.96 | 0.91 |
4.1 Known model parameters
We generate 10,000 trajectories of each of the four 2-regime MRS models defined in Table 1. The length of each trajectory is 2,000 observations, which corresponds to 5.5 years of daily data (note, that markets for electricity operate 365 day per year). We apply the ewedf, the wedf and the Rosenblatt transformation-based goodness-of-fit tests to each simulated trajectory and then calculate the percentage of rejected hypotheses \(H_0\) at the 5 % significance level. We assume that the model parameters are known. The computation of \(E(X_{t,1}|\mathbf{{x}}_t)\) in the wedf approach requires backward recursion until the previous observation from the mean-reverting regime is found, see (21). However, as the number of observations is limited, the condition \(P(R_t=1|\mathbf{{x}}_t)=1\) might not be fulfilled at all. The estimation scheme requires some approximation or an additional assumption. Here, we assume that for each simulated trajectory the first observation comes from the mean-reverting regime.
In the ewedf approach the tested hypothesis says that the state process is known (and coincides with the proposed classification of the observations to the regimes). As a consequence, once the regimes are identified, it is equivalent to the standard edf approach. To test how it performs for an MRS model with a latent state process, we apply it to the simulated trajectories (we first identify the regimes, then test whether the sample is generated from the assumed MRS model).
Percentage of rejected hypotheses \(H_0\) at the \(5\,\%\) significance level calculated from 10,000 simulated trajectories of 2000 observations each of the four 2-regime MRS models defined in Table 1
ewedf | wedf | Rtrans | RCM | |||||
---|---|---|---|---|---|---|---|---|
First | Second | Model | First | Second | Model | Model | ||
Sim #1 | 0.0641 | 0.8648 | 0.1242 | 0.0527 | 0.0540 | 0.0481 | 0.0542 | 4.33 |
Sim #2 | 0.2244 | 0.0798 | 0.1194 | 0.0499 | 0.0527 | 0.0314 | 0.0493 | 12.69 |
Sim #3 | 0.0586 | 0.4190 | 0.1060 | 0.0817 | 0.0458 | 0.0702 | 0.0902 | 12.81 |
Sim #4 | 0.0519 | 0.3206 | 0.1025 | 0.0505 | 0.0389 | 0.0341 | 0.0553 | 29.63 |
Finally, to measure the quality of regime classification, we use the regime classification measure (RCM) of Ang and Bekaert (2002), see the last column in Table 2. Since the true regime is a Bernoulli random variable, the RCM statistic is essentially a sample estimate of its variance. The RCM statistic is rescaled so that a value of 0 means perfect regime classification and a value of 100 implies that no information about the regimes is revealed. In our case, the regime classification is very good or good for specification I models used for modeling electricity prices in Sect. 5 (Sim #1 and Sim #3) and good or moderately good for specification II models (Sim #2 and Sim #4). The lower RCM values for type I models also imply that in these models the regimes are better separated than in the respective type II models. Finally, in models whose parameters are estimated from the NEPOOL log-prices (Sim #3 and Sim #4) the regimes are less separated than in the other two models.
4.2 Unknown model parameters
- 1.
Estimate the parameter vector (\(\hat{\theta }\)) and calculate the test statistic (\(d_n\)) according to formula (9).
- 2.
For ‘K-S table’-type (‘K-S tab.’) estimation calculate the \(p\) value using K-S test tables and assuming that the sample comes from a model with parameter vector \(\hat{\theta }\).
- 3.For ‘MC simulation’-type (‘MC sim.’) estimation:
- (a)
simulate \(N=500\) trajectories with parameter vector \(\hat{\theta }\) (these trajectories will be used to compute the estimate of the \(p\) value),
- (b)
for each trajectory \(i=1,\ldots ,N\) estimate the parameter vector (\(\hat{\theta }_i\)) and calculate the test statistic (\(d_n^i\)),
- (c)
calculate the \(p\) value as the proportion of simulated trajectories with the test statistic values higher or equal to \(d_n\), i.e., \(\frac{1}{N}\#\{i: d_n^i\ge d_n\}\).
- (a)
Percentage of rejected hypotheses \(H_0\) at the \(5\,\%\) significance level calculated from 500 simulated trajectories of 2,000 observations each of the models defined in Table 1 with parameters estimated from each sample
| Regime | ewedf | wedf | Rtrans | ||||
---|---|---|---|---|---|---|---|---|
First | Second | Model | First | Second | Model | Model | ||
Sim #1 | K-S tab. | 0.0360 | 0.5860 | 0.0540 | 0.0420 | 0 | 0.0400 | 0.0400 |
MC sim. | 0.0640 | 0 | 0.0580 | 0.0660 | 0.0580 | 0.0680 | 0.0620 | |
Sim #2 | K-S tab. | 0.0040 | 0.1940 | 0.0080 | 0 | 0 | 0 | 0 |
MC sim. | 0.0320 | 0.0080 | 0.0360 | 0.0440 | 0.0340 | 0.0520 | 0.0460 | |
Sim #3 | K-S tab. | 0 | 0.0140 | 0 | 0 | 0 | 0 | 0 |
MC sim. | 0.0360 | 0.0440 | 0.0400 | 0.0340 | 0.0500 | 0.0300 | 0.0380 | |
Sim #4 | K-S tab. | 0.1400 | 0.0020 | 0.0120 | 0 | 0 | 0 | 0 |
MC sim. | 0.0340 | 0.0180 | 0.0560 | 0.0580 | 0.0380 | 0.0640 | 0.0420 |
Looking at the test results based on the K-S test tables (‘K-S tab.’ in Table 3), for the ewedf approach the rejection percentages deviate significantly from the \(5\,\%\) level. On the other hand, for the wedf and the Rosenblatt transformation-based approaches the \(p\) values are overestimated, what results in rejection percentages much lower than the 5 % significance level. Observe that for most of the models none of the tests were rejected. Therefore, if \(p\) values obtained with the wedf or the Rosenblatt transformation-based approaches are close to the significance level, the test may fail to reject a false \(H_0\) hypothesis. This is not the case for the testing approach utilizing Monte Carlo simulations (‘MC sim.’ in Table 3) as the obtained rejection percentages are close to the 5 % significance level. This example clearly shows that the wedf and the Rosenblatt transformation test based on the K-S test tables can only be used if it returns a \(p\) value below the significance level (i.e., if it rejects the \(H_0\) hypothesis) or well above the significance level. However, if the obtained \(p\) value is close to the significance level, Monte Carlo simulations should be performed.
4.3 Power of the tests
- AR-ARG1 vs. AR-AR The trajectories are simulated from an MRS model defined as:where \(\alpha _1=1\), \(\beta _1=0.8\), \(\sigma _1^2=1\), \(\gamma _1=0\), \(\alpha _2=3\), \(\beta _2=0.4\), \(\sigma _2^2=0.05\), \(\gamma _2=1\), \(p_{11}=0.6\) and \(p_{22}=0.5\). The model is denoted by AR-ARG1, which indicates that the first regime is driven by an AR(1) process and the second regime by a heteroskedastic autoregressive process with \(\gamma =1\) (i.e., ARG1). We test whether the simulated trajectories can be described by the model defined in Eq. (6), i.e., following specification II, and denoted here by AR-AR.$$\begin{aligned} X_{t}=\alpha _{R_t}+ (1-\beta _{R_t})X_{t-1} +\sigma _{R_t}X_{t-1}^{\gamma _i} \epsilon _t, \quad R_t\in \{1,2\}, \end{aligned}$$
AR-E vs AR-LN The trajectories are simulated from an MRS model following specification I, see (4) and (5), with an exponential distribution in the second regime, i.e., \(F^2 \sim \text{ Exp}(\lambda )\). The model is denoted here by AR-E and its parameters are given by: \(\alpha =10\), \(\beta =0.6\), \(\sigma ^2=10\), \(\lambda =30\), \(p_{11}=0.6\) and \(p_{22}=0.5\). We test whether the simulated trajectories can be driven by a model following specification I with a log-normal distribution in the second regime (i.e., AR-LN).
- CIR-LN vs AR-G The trajectories are simulated from an MRS model defined as:where \(\alpha _1=1\), \(\beta _1=0.8\), \(\sigma _1^2=0.5\), \(\alpha _2=2\), \(\sigma _2^2=0.5\), \(p_{11}=0.6\) and \(p_{22}=0.5\), i.e., the first regime is a discrete time version of the square root process, also known as the CIR process (Cox et al. 1985), and the second is a log-normal random variable. Hence, the name CIR-LN. We test whether the simulated trajectories can be driven by a model following specification I with a Gaussian distribution in the second regime.$$\begin{aligned} X_{t,1}&= \alpha _1+(1-\beta _1)X_{t-1,1} +\sigma _1 \sqrt{X_{t-1,1}} \epsilon _t,\\ X_{t,2}&\sim&LN\left(\alpha _2,\sigma _2^2\right), \end{aligned}$$
Percentages of rejected hypotheses \(H_0\) at the \(5\%\) significance level for the alternative models with parameters estimated for each of the 500 simulated trajectories of \(T=100\), \(500\) or \(2000\) observations
T | Regime | ewedf | wedf | Rtrans | ||||
---|---|---|---|---|---|---|---|---|
First | Second | Model | First | Second | Model | Model | ||
AR-ARG1 vs. AR-AR | ||||||||
2,000 | K-S tab. | 0.6420 | 1.0000 | 1.0000 | 0.0300 | 1.0000 | 1.0000 | 0.9960 |
MC sim. | 0.0520 | 1.0000 | 1.0000 | 0.4000 | 1.0000 | 1.0000 | 1.0000 | |
500 | K-S tab. | 0.1160 | 0.6100 | 0.5060 | 0.0060 | 0.4440 | 0.0860 | 0.0280 |
MC sim. | 0.0960 | 0.8280 | 0.8480 | 0.1840 | 0.9840 | 0.8780 | 0.9740 | |
100 | K-S tab. | 0.0160 | 0.0120 | 0.0120 | 0.0020 | 0.0100 | 0 | 0 |
MC sim. | 0.1140 | 0.1520 | 0.2080 | 0.0800 | 0.2120 | 0.1580 | 0.2580 | |
AR-E vs. AR-LN | ||||||||
2,000 | K-S tab. | 0.0820 | 0.9980 | 0.9980 | 0.0620 | 0.9980 | 0.7460 | 0.9820 |
MC sim. | 0.0300 | 0.9680 | 0.7580 | 0.0560 | 0.9980 | 0.9620 | 0.9900 | |
500 | K-S tab. | 0.0660 | 0.9780 | 0.5700 | 0.0660 | 0.2140 | 0.0580 | 0.1060 |
MC sim. | 0.0800 | 0.2120 | 0.2320 | 0.0940 | 0.8960 | 0.2760 | 0.4000 | |
100 | K-S tab. | 0.0260 | 0.2780 | 0.0240 | 0.0300 | 0.0060 | 0.0080 | 0.0140 |
MC sim. | 0.0540 | 0.0180 | 0.0880 | 0.0960 | 0.1620 | 0.1260 | 0.1480 | |
CIR-LN vs. AR-G | ||||||||
2,000 | K-S tab. | 0.9900 | 0.9180 | 0.9900 | 0.9900 | 0.9400 | 0.9900 | 0.9900 |
MC sim. | 0.9900 | 0.9880 | 0.9900 | 0.9900 | 0.9900 | 0.9900 | 0.9900 | |
500 | K-S tab. | 0.9940 | 0.1600 | 0.9900 | 0.9940 | 0.1880 | 0.9860 | 0.6840 |
MC sim. | 0.9920 | 0.7460 | 0.9860 | 0.9940 | 0.8120 | 0.9940 | 0.9940 | |
100 | K-S tab. | 0.1860 | 0.0260 | 0.0760 | 0.2520 | 0.0340 | 0.1820 | 0.0200 |
MC sim. | 0.3200 | 0.2440 | 0.3760 | 0.7440 | 0.2920 | 0.7920 | 0.5560 |
Looking at the MC simulation results obtained for the largest samples of \(T=2{,}000\) observations, we can see that in almost all cases the false hypothesis was rejected. The lowest rejection rate for the ewedf approach was 0.7580, for the wedf approach—0.9620 and for the Rosenblatt transformation—0.9900. All three lowest rates were obtained for the challenging AR-E vs. AR-LN test scenario. However, if the samples are smaller, the power of the tests apparently decreases. The sample size of \(T=100\) observations does not seem to be enough, especially if the degree of regime separation is low, like in the AR-E vs. AR-LN scenario. This is not the case if the definitions of both regimes are significantly different, as for the CIR-LN vs. AR-G scenario, for which the power is satisfactory even if \(T=100\). Comparing the ewedf and wedf approaches, we can observe that the latter yields higher on average rejection rates. A comparison of the wedf and the Rosenblatt transformation-based approach does not yield such a clear picture. For the AR-ARG1 vs. AR-AR scenario the power of the Rosenblatt transformation-based approach is higher than of the wedf approach, for the AR-E vs. AR-LN scenario it is only slightly higher, while for the CIR-LN vs. AR-G scenario it is the wedf approach which produces higher rejection rates (for small samples).
Percentages of rejected hypotheses \(H_0\) at the \(5\,\%\) significance level for the alternative models with parameters estimated for each of the 500 simulated trajectories of \(T=100\), \(500\) or \(2{,}000\) observations
T | Regime | ewedf | wedf | Rtrans | ||||
---|---|---|---|---|---|---|---|---|
First | Second | Model | First | Second | Model | Model | ||
AR-ARG1 vs. AR-AR | ||||||||
2,000 | K-S tab. | 0 | 0.2300 | 0.0080 | 0 | 0 | 0 | 0 |
MC sim. | 0.0560 | 0.0200 | 0.0720 | 0.0660 | 0.0440 | 0.0700 | 0.0560 | |
500 | K-S tab. | 0.0080 | 0.0620 | 0.0160 | 0 | 0 | 0 | 0 |
MC sim. | 0.0260 | 0.0020 | 0.0520 | 0.0480 | 0.0160 | 0.0560 | 0.0460 | |
100 | K-S tab. | 0.0120 | 0.0380 | 0.0040 | 0 | 0 | 0 | 0 |
MC sim. | 0.0280 | 0.0320 | 0.0840 | 0.0380 | 0.0040 | 0.0560 | 0.0660 | |
AR-E vs. AR-LN | ||||||||
2,000 | K-S tab. | 0.9980 | 0.7420 | 0.9980 | 0.9980 | 0.7700 | 0.9980 | 0.9980 |
MC sim. | 0.9980 | 0.4500 | 0.9980 | 0.9980 | 0.8440 | 0.9980 | 0.9980 | |
500 | K-S tab. | 0.8600 | 0.2000 | 0.8600 | 0.8640 | 0.1620 | 0.8640 | 0.8600 |
MC sim. | 0.8740 | 0.1620 | 0.8740 | 0.8740 | 0.3480 | 0.8760 | 0.8760 | |
100 | K-S tab. | 0.3200 | 0.0020 | 0.3200 | 0.3200 | 0.0020 | 0.3200 | 0.3140 |
MC sim. | 0.3560 | 0.0620 | 0.3660 | 0.3520 | 0.0380 | 0.3600 | 0.3560 | |
CIR-LN vs. AR-G | ||||||||
2,000 | K-S tab. | 0.0480 | 0.4160 | 0.0860 | 0.0660 | 0.0260 | 0.0800 | 0.0540 |
MC sim. | 0.0940 | 0.1200 | 0.0960 | 0.0680 | 0.1180 | 0.0800 | 0.0840 | |
500 | K-S tab. | 0.0560 | 0.0540 | 0.0500 | 0.0820 | 0.0120 | 0.0540 | 0.0580 |
MC sim. | 0.0900 | 0.0660 | 0.1060 | 0.1080 | 0.0780 | 0.1080 | 0.1200 | |
100 | K-S tab. | 0.0100 | 0.0160 | 0.0100 | 0.0180 | 0.0100 | 0.0080 | 0.0160 |
MC sim. | 0.0440 | 0.0540 | 0.1360 | 0.0740 | 0.0700 | 0.1540 | 0.1520 |
5 Application to electricity spot prices
a 2-regime MRS model with mean-reverting, see (4), base regime (\(R_t=1\)) and i.i.d. shifted log-normally distributed spikes (\(R_t=2\)) or
a 3-regime MRS model with mean-reverting, see (4), base regime (\(R_t=1\)), i.i.d. shifted log-normally distributed spikes (\(R_t=2\)) and i.i.d. drops (\(R_t=3\)) distributed according to the inverted shifted log-normal law.
Furthermore recall, that \(X\) follows the shifted log-normal law or inverted shifted log-normal law if \(\log (X-q)\), respectively \(\log (q-X)\), has a Gaussian distribution. The cutoff level \(q\) can be different for the spike and the drop regime, however, here—motivated by the results of Janczura and Weron (2013)—we set it to the first and the third quartile of the dataset for drops and spikes, respectively. Using shifted log-normal distributions allows to increase the degree of separation as measured by RCM and, hence, increase the power of the tests (see Sect. 4). For instance, the 2-regime model now yields an RCM of 4.79, compared to 12.81 for Sim #3 in Table 2. It also leads to more fundamentally justified models since electricity price spikes are generally connected with scheduling units with higher marginal costs (like gas turbines, see e.g., Eydeland and Wolyniec 2012; Weron 2006).
Finally note, that such simple one-factor models may not be complex enough to address all features of electricity prices. In particular, the electricity forward prices implied by these spot price models exhibit the so-called Samuelson effect (i.e., a decrease in volatility with increasing time to maturity; for the considered models the volatility scales as \(e^{-\beta (T-t)}\)), but the rate of decrease is completely determined by the speed of mean-reversion \(\beta \) (Janczura and Weron 2012). However, the rate of decrease should be large only for maturities up to a year (Kiesel et al. 2009). Perhaps, incorporating another stochastic factor would lead to a more realistic forward price curve.
Following Weron (2009) the deseasonalization is conducted in three steps; for a thorough study of modeling seasonal components in electricity spot prices we refer to Janczura et al. (2012). First, the long-term seasonal component (LTSC) \(T_t\) is estimated from daily spot prices \(P_t\) using a wavelet filter-smoother of order 6. A single non-parametric LTSC is used here to represent the long-term non-periodic fuel price levels, the changing climate/consumption conditions throughout the years and strategic bidding practices. As shown by Janczura and Weron (2010), the wavelet-estimated LTSC pretty well reflects the ‘average’ fuel price level, understood as a combination of natural gas, crude oil and coal prices; see also Eydeland and Wolyniec (2012) and Karakatsani and Bunn (2010) for a treatment of fundamental and behavioral drivers of electricity prices. On the other hand, as discussed recently in Janczura and Weron (2012), the use of the wavelet-based LTSC is somewhat controversial. Predicting it beyond the next few weeks is a difficult task, because individual wavelet functions are quite localized in time or (more generally) in space. Preliminary research suggests, however, that despite this feature the wavelet-based LTSC can be extrapolated into the future yielding a better on-average prediction of the level of future spot prices than an extrapolation of a sinusoidal LTSC (Nowotarski et al. 2011).
Parameters of the 2- and 3-regime MRS models with mean-reverting base regime and independent spikes and drops driven by shifted log-normal laws fitted to the deseasonalized NEPOOL log-prices
Model | Base regime | Spike regime | Drop regime | Probabilities | ||||||
---|---|---|---|---|---|---|---|---|---|---|
\(\alpha \) | \(\beta \) | \(\sigma ^2\) | \(\alpha _2\) | \(s_2^2\) | \(\alpha _3\) | \(s_3^2\) | \(p_{11}\) | \(p_{22}\) | \(p_{33}\) | |
2-Regime | 0.71 | 0.21 | 0.0060 | \(-1.57\) | 0.32 | – | – | 0.98 | 0.74 | – |
3-Regime | 0.99 | 0.29 | 0.0053 | \(-1.65\) | 0.33 | \(-1.93\) | 0.19 | 0.96 | 0.76 | 0.89 |
For both analyzed MRS models, tests based on the ewedf, the wedf and the Rosenblatt transformation are performed. The \(p\) values in Table 7 are reported both for the standard approach utilizing the K-S test tables (which generally leads to overestimated \(p\) values) and the much slower but more accurate Monte Carlo setup. The testing procedure in the Monte Carlo case is analogous to the one used in the simulation study, see Sect. 4.2 for a detailed description. Again, in order to verify the ‘whole model’ goodness-of-fit, we transform the spike and drop regime observations so that both samples are \(N(0,1)\)-distributed.
The \(p\) values of the K-S test based on the ewedf, the wedf and the Rosenblatt transformation (Rtrans) for the 2- and 3-regime MRS models of the deseasonalized NEPOOL log-prices
Regime | ewedf | wedf | Rtrans | ||||||
---|---|---|---|---|---|---|---|---|---|
Base | Spike | Drop | Model | Base | Spike | Drop | Model | Model | |
2-Regime model | |||||||||
K-S tab. | 0.15 | 0.54 | – | 0.2170 | 0.05 | 0.87 | – | 0.0960 | 0.2850 |
MC sim. | 0.00 | 0.34 | – | 0.0240 | 0.00 | 0.50 | – | 0.0020 | 0.0470 |
3-Regime model | |||||||||
K-S tab. | 0.42 | 0.58 | 0.99 | 0.3950 | 0.29 | 0.78 | 0.99 | 0.3550 | 0.3900 |
MC sim. | 0.08 | 0.26 | 0.91 | 0.0780 | 0.08 | 0.35 | 0.96 | 0.0820 | 0.1190 |
6 Conclusions
While most of the electricity spot price models proposed in the literature are elegant, their fit to empirical data has either been not examined thoroughly or the signs of a bad fit ignored. As the empirical study of Sect. 5 has shown, even reasonably looking and popular models should be carefully tested before they are put to use in trading or risk management departments. The goodness-of-fit wedf-based test introduced in Sect. 3.2.2 provides an efficient tool for accepting or rejecting a given Markov regime-switching (MRS) model for a particular data set. While its performance (including power; see Sect. 4.3) is similar to that of the Rosenblatt transformation-based approach, it provides an edge over the latter by yielding \(p\) values for the individual regimes. For instance, this allows to observe that in the 3-regime model the worst fit is obtained for the base regime. Perhaps the simple AR(1) structure is not enough to model the complex dynamics of electricity spot prices in the relatively calm, non-spiky periods.
However, in this paper we have not restricted ourselves to MRS models but pursued a more general goal. Namely, we have proposed a goodness-of-fit testing scheme for the marginal distribution of regime-switching models, including variants with an observable and with a latent state process. For both specifications we have described the testing procedure. The models with a latent state process (i.e., MRS models) required the introduction of the concept of the weighted empirical distribution function (wedf) and a generalization of the Kolmogorov–Smirnov test to yield an efficient testing tool.
We have focused on two commonly used specifications of regime-switching models in the energy economics literature—one with dependent autoregressive states and a second one with independent autoregressive and i.i.d. regimes. Nonetheless, the proposed approach can be easily applied to other specifications of regime-switching models (for instance, to 3-regime models with heteroscedastic base regime dynamics; see Janczura and Weron 2010). Very likely it can be also extended to other goodness-of-fit edf-type tests, like the Anderson–Darling. As the latter puts more weight to the observations in the tails of the distribution than the Kolmogorov–Smirnov test, it might be more discriminatory and provide a better testing tool for extremely spiky data. Future work will be conducted in this direction.
Finally note, that a good in-sample fit does not necessarily imply a good forecast behavior. Although Kosater and Mosler (2006) found for German electricity price data that for long run point forecasts (30–80 days ahead) an MRS model with regimes driven by two AR(1) processes was slightly more accurate than a simple AR(1) model, for shorter time horizons both model classes performed alike. It remains an open question how do the MRS models fitted to NEPOOL log-prices in Sect. 5 perform in terms of forecasting. The adequacy of MRS models for forecasting in general has been questioned by Bessec and Bouabdallah (2005). However, as Weron and Misiorek (2008) have shown, regime-switching models may behave better than their linear competitors in volatile periods. They might also have an edge in density forecasts, but this has to be verified yet.
Acknowledgments
We thank two anonymous reviewers for their comments and suggested improvements. This paper has also benefited from conversations with the participants of the DStatG 2010 Annual Meeting, the Trondheim Summer 2011 Energy Workshop, the 2011 WPI Conference in Energy Finance, the Energy Finance Christmas Workshop (EFC11), the 9th International Conference ‘European Energy Market’ (EEM12) and the seminars at Macquarie University, National University of Singapore and University of Poznań. This work was supported by funds from the National Science Centre (NCN, Poland) through grant no. 2011/01/B/HS4/01077.
Open Access
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.