Goodnessoffit testing for the marginal distribution of regimeswitching models with an application to electricity spot prices
Authors
Open AccessArticle
 First Online:
 Received:
 Accepted:
DOI: 10.1007/s1018201202029
Abstract
This paper complements a recently published study (Janczura and Weron in AStAAdv Stat Anal 96(3):385–407, 2012) on efficient estimation of Markov regimeswitching models. Here, we propose a new goodnessoffit testing scheme for the marginal distribution of such models. We consider models with an observable (like threshold autoregressions) as well as a latent state process (like Markov regimeswitching). The test is based on the Kolmogorov–Smirnov supremumdistance statistic and the concept of the weighted empirical distribution function. The motivation for this research comes from a recent stream of literature in energy economics concerning electricity spot price models. While the existence of distinct regimes in such data is generally unquestionable (due to the supply stack structure), the actual goodnessoffit of the models requires statistical validation. We illustrate the proposed scheme by testing whether commonly used Markov regimeswitching models fit deseasonalized electricity prices from the NEPOOL (US) dayahead market.
Keywords
Regimeswitching Energy economics Goodnessoffit Weighted empirical distribution function Kolmogorov–Smirnov test1 Introduction
Regimeswitching models have attracted a lot of attention in the recent years. A flexible specification allowing for abrupt changes in model dynamics has led to its popularity not only in econometrics (Choi 2009; Hamilton 1996b; Lux and MoralesArias 2010), but also in other fields as diverse as traffic modeling (Cetin and Comert 2006), population dynamics (Luo and Mao 2007), river flow analysis (Vasas et al. 2007) or earthquake counts (Bulla and Berzel 2008). This paper is motivated by yet another stream of literature: electricity spot price models in energy economics (Bierbrauer et al. 2007; De Jong 2006; Erlwein et al. 2010; Huisman and de Jong 2003; Janczura and Weron 2010, 2012; Karakatsani and Bunn 2008, 2010; Mari 2008; Misiorek et al. 2006; Weron 2009). Regimeswitching models have seen extensive use in this area due to their relative parsimony (a prerequisite in derivatives pricing) and the ability to capture the unique characteristics of electricity prices (in particular, the spiky and nonlinear price behavior). While the existence of distinct regimes in electricity prices is generally unquestionable (being a consequence of the nonlinear, heterogeneous supply stack structure in the power markets, see e.g., Eydeland and Wolyniec 2012; Weron 2006), the actual goodnessoffit of the models requires statistical validation.
However, recent work concerning the statistical fit of regimeswitching models has been mainly devoted to testing parameter stability versus the regimeswitching hypothesis. Several tests have been constructed for the verification of the number of regimes. Most of them exploit the likelihood ratio technique (Cho and White 2007; Garcia 1998), but there are also approaches related to recurrence times (Sen and Hsieh 2009), likelihood criteria (Celeux and Durand 2008) or the information matrix (Hu and Shin 2008). Specification tests to detect autocorrelation and ARCH effects were proposed by (Hamilton (1996a), based on the score function technique) and more recently by (Smith (2008), utilizing the Rosenblatt transformation; see also Sect. 3.3). Smith found that the performance of the Ljung–Box test improved when used on the normally distributed Rosenblatt transformation. However, the considered Markov regimeswitching models were relatively simple and had two states differing only in mean. Interestingly, the Rosenblatt transformation was used earlier for evaluating density forecasts of regimeswitching (Berkowitz 2001; Diebold et al. 1998; Haas et al. 2004) and stochastic volatility models (Kim et al. 1998), typically in a risk management context.
On the other hand, to our best knowledge, procedures for goodnessoffit testing of the marginal distribution of regimeswitching models have not been derived to date (with the exception of Janczura and Weron 2009, where an ewedftype test was introduced in the context of electricity spot price models, see Sect. 3.2.1 for details). With this paper we want to fill the gap and propose empirical distribution function (edf) based testing procedures built on the Kolmogorov–Smirnov test, that are dedicated to regimeswitching models with observable as well as latent state processes. In contrast to the approaches based on the Rosenblatt transformation, the techniques proposed in this paper allow for testing the fit in the individual regimes as well as of the whole model. This can be advantageous in many situations as we additionally obtain information on which regimes are correctly and which are incorrectly specified. The derivation of the tests is not straightforward and in the case of a latent state process requires an application of the concept of the weighted empirical distribution function (wedf). Finally, we should also note, that the term “marginal distribution” does not mean that the proposed tests do not account for the dynamic regime structure. On the contrary, the dynamical structure is considered when constructing residuals used in the testing procedures.
The paper is structured as follows: in Sect. 2 we describe the structure of the analyzed regimeswitching models and briefly explain the estimation process (for details we refer to an article recently published in AStA; Janczura and Weron 2012). In Sect. 3, we introduce goodnessoffit testing procedures appropriate for regimeswitching models both with observable and latent state processes. Next, in Sect. 4 we provide a simulation study and check the performance of the proposed techniques. Since the motivation for this paper comes from the energy economics literature, in Sect. 5 we show how the presented testing procedure can be applied to verify the fit of Markov regimeswitching models to electricity spot prices. Finally, in Sect. 6 we conclude and outline future work.
2 Regimeswitching models
2.1 Model definition
2.2 Estimation

Step 1 Denote the observation vector by \(\mathbf x_T =(x_1,x_2,\ldots ,x_T)\). For a parameter vector \(\theta ^{(n)}\) compute the conditional probabilities \(P(R_t = i\mathbf x_T ;\theta ^{(n)})\)—the so called ‘smoothed inferences’—for the process being in regime \(i\) at time \(t\).

Step 2 Calculate new and more exact maximum likelihood estimates \(\theta ^{(n+1)}\) using the loglikelihood function, weighted with the smoothed inferences from Step 1, i.e.,where \(f_i(x_t\mathbf x_{t1} ;\theta ^{(n+1)})\) is the conditional density of the \(i\)th regime, and update the transition probabilities:$$\begin{aligned} \log \left[L(\theta ^{(n+1)})\right]=\sum _{i=1}^2\sum _{t=1}^T P(R_t = i\mathbf x_T ;\theta ^{(n)})\log \left[f_i(x_t\mathbf x_{t1} ;\theta ^{(n+1)})\right], \end{aligned}$$$$\begin{aligned} p^{(n+1)}_{ij}= \frac{\sum _{t=2}^T P(R_t =j, R_{t1}=i\mathbf x _{T};\theta ^{(n)})}{\sum _{t=2}^{T}P(R_{t1}=i\mathbf x _{T};\theta ^{(n)})}. \end{aligned}$$
3 Goodnessoffit testing
In this section, we introduce a goodnessoffit testing technique, that can be applied to evaluate the fit of regimeswitching models. It is based on the Kolmogorov–Smirnov (KS) goodnessoffit test and verifies whether the null hypothesis \(H_0\) that observations come from the distribution implied by the model specification cannot be rejected. The procedure can be easily adapted to other empirical distribution function (edf) type tests, like the Anderson–Darling test.
3.1 Testing in case of an observable state process
3.1.1 Specification I
Further, observe that transformation \(h(X_{t+k,1},X_{t,1},k)\) is based on subtracting the conditional mean from \(X_{t+k,1}\) and standardizing it with the conditional variance. Indeed, \((1\beta )^k X_{t,1} + \alpha \frac{1(1\beta )^k}{\beta }\) is the conditional expected value of \(X_{t+k,1}\) given \((X_{1,1},X_{2,1},\ldots ,X_{t,1})\) and \(\sigma ^2\frac{1(1\beta )^{2k}}{1(1\beta )^2}\) is the respective conditional variance.
The goodnessoffit of the marginal distribution of the individual regimes can be formally tested. For the meanreverting regime, \(F\) is the standard Gaussian cdf and \((y_1,y_2,\ldots ,y_{n_1})\) is the subsample of the standardized residuals obtained by applying transformation (7), while for the second regime, \(F\) is the modelspecified cdf (i.e., \(F^2\)) and \((y_1,y_2,\ldots ,y_{n_2})\) is the subsample of respective observations. Observe that the ‘whole model’ goodnessoffit can be also verified, using the fact that for \(X\sim F^2\) we have that \(Y=(F^1)^{1}[ F^2 (X) ]\) is \(F^1\)distributed. Indeed, a sample \((y_1^{1},y_2^{1},\ldots ,y_{n_1}^{1},y_1^{2},y_2^{2},\ldots ,y_{n_2}^{2})\), where \(y_t^{1}\)s are the standardized residuals of the meanreverting regime, while \(y_t^{2}\)’s are the transformed variables corresponding to the second regime, i.e., \(y_t^{2}=(F^1)^{1}[ F^2 (x_{t,2}) ]\), is i.i.d. \(N(0,1)\)distributed and, hence, the testing procedure is applicable.
3.1.2 Specification II
The \(H_0\) hypothesis now states that the sample \((x_1,x_2,\ldots ,x_T)\) is driven by a regimeswitching model defined by Eq. (6) with \(R_t\in \{1,2\}\). Similarly, as in the independent regimes case, the testing procedure is based on extracting the residuals of the meanreverting process. Indeed, observe that under the \(H_0\) hypothesis the transformation \(h(x_t,x_{t1},1)\), defined in (7), with parameters \(\alpha _{R_t}, \beta _{R_t}\) and \(\sigma _{R_t} \) corresponding to the current value of the state process \(R_t\), yields an i.i.d. \(N(0,1)\) distributed sample. Thus, the Kolmogorov–Smirnov test can be applied. The test statistic \(d_n\), see (9), is calculated with the standard Gaussian cdf and the sample \((y_1,y_2,\ldots ,y_T)\) of the standardized residuals, i.e., \(y_t=h(x_t,x_{t1},1)\).
3.1.3 Critical values
Note, that the described above testing procedure is valid only if the parameters of the hypothesized distribution are known. Unfortunately in typical applications the parameters have to be estimated beforehand. If this is the case, then the critical values for the test must be reduced (Čižek et al. 2011). In other words, if the value of the test statistics \(d_n\) is \(d\), then the \(p\) value is overestimated by \(P(d_n \ge d)\). Hence, if this probability is small, then the \(p\) value will be even smaller and the hypothesis will be rejected. However, if it is large then we have to obtain a more accurate estimate of the \(p\) value.
To cope with this problem, Ross (2002) recommends using Monte Carlo simulations. In our case the procedure reduces to the following steps. First, the parameter vector \(\hat{\theta }\) is estimated from the dataset and the test statistic \(d_n\) is calculated according to formula (9). Next, \(\hat{\theta }\) is used as a parameter vector for \(N\) simulated samples from the assumed model. For each sample the new parameter vector \(\hat{\theta }_i\) is estimated and the new test statistic \(d_n^i\) is calculated using formula (9). Finally, the \(p\) value is obtained as the proportion of simulated samples with the test statistic values higher or equal to \(d_n\), i.e., \(\textit{p} { \text{ value}}=\frac{1}{N}\#\{i: d_n^i\ge d_n\}\).
3.2 Testing in case of a latent state process
3.2.1 The ewedf approach
Now, assume that the sample \((x_1,x_2,\ldots ,x_T)\) is driven by an MRS model. The regimes are not directly observable and, hence, the standard edf approach can be used only if an identification of the state process is performed first. Recall that, as a result of the estimation procedure described in Sect. 2.2, the so called ‘smoothed inferences’ about the state process are derived. The smoothed inferences are the probabilities that the \(t\)th observation comes from a certain regime given the whole available information \(P(R_t=ix_1,x_2,\ldots ,x_T)\). Hence, a natural choice is to relate each observation with the most probable regime by letting \(R_t=i\) if \(P(R_t=ix_1,x_2,\ldots ,x_T)>0.5\). Then, the testing procedure described in Sect. 3.1 is applicable. However, we have to mention, that the hypothesis \(H_0\) now states that \((x_1,x_2,\ldots ,x_T)\) is driven by a regimeswitching model with known state process values. We call this approach ‘ewedf’, which stands for ‘equallyweighted empirical distribution function’. It was introduced by Janczura and Weron (2009) in the context of electricity spot price MRS models.
3.2.2 The weighted empirical distribution function (wedf)
In the standard goodnessoffit testing approach based on the edf each observation is taken into account with weight \(\frac{1}{n}\) (i.e., inversely proportional to the size of the sample). However, in MRS models the state process is latent. The estimation procedure (the EM algorithm) only yields the probabilities that a certain observation comes from a given regime. Moreover, in the resulting marginal distribution of the MRS model each observation is, in fact, weighted with the corresponding probability. Therefore, a similar approach should be used in the testing procedure.
However, to our best knowledge, none of the applications of wedf is related to goodnessoffit testing of Markov regimeswitching models. Here, we use the wedf concept to deal with the case when observations cannot be unambiguously classified to one of the regimes and, hence, a natural choice of weights of wedf seems to be \(w_t=P(R_t=ix_1,x_2,\ldots ,x_T)=E(\mathbb{I }_{\{R_t=i\}}x_1,x_2,\ldots ,x_T)\) for the \(i\)th regime observations.
3.2.3 The wedf approach for specification II
Lemma 1
If \(H_0\) is true, then \(F_n\) given by (11) is an unbiased, consistent estimator of the distribution of the residuals (in this case Gaussian).
Note, that proofs of all lemmas and theorems formulated in this section can be found in the Appendix.
The following theorem yields a version of the KS test applicable to parameterswitching MRS model (6). Note, that if the state process was observable, it would boil down to the standard KS test (Lehmann and Romano 2005, p. 584).
Theorem 1
Lemma 2
If \(H_0\) is true, then \(F^i_n(x)\) given by (15) is an unbiased estimator of \(F^i(x)\). Moreover, it is consistent if \(\forall _{i,j=1,2}\) \(p_{ij}<1\).
An analogue of Theorem 1 can be derived.
Theorem 2
3.2.4 The wedf approach for specification I
Now, to test the fit of the meanreverting regime, it is enough to calculate \(d^i_n\) according to formula (18) with the standard Gaussian cdf and \(y_t^1= g(x_{t},\mathbf{{x}}_{t1})\). Observe, that the observations from the second regime are i.i.d. by definition, so the testing procedure is straightforward with \(F^2\) cdf and sample \((x_1,x_2,\ldots ,x_{T})\). Moreover, the ‘whole model’ goodnessoffit can be also verified. Theorem 1 is directly applicable, if the distributions of the samples corresponding to both regimes are the same \(F=F^1=F^2\). Observe that, even if \(F^1\ne F^2\), the test still can be applied using the fact that for \(X\sim F^2\) we have that \(Y=(F^1)^{1}[ F^2 (X) ]\) is \(F^1\)distributed. The test statistic \(d_n\) is calculated as in (14) with \(F^1\) cdf (here Gaussian) and the sample \((y_1^{1},y_2^{1},\ldots ,y_{T1}^{1},y_1^{2},y_2^{2},\ldots ,y_{T}^{2})\), where \((y_1^{1},y_2^{1},\ldots ,y_{T1}^{1})\) are the transformed variables of the meanreverting regime, i.e., \(y_t^{1}= g(x_{t,1},\mathbf{{x}}_{t1})\), while \((y_1^{2},y_2^{2},\ldots ,y_{T}^{2})\) are the variables corresponding to the second regime, i.e., \(y_t^{2}=(F^1)^{1}[ F^2 (x_t) ]\).
Note, that as in the case of an observable state process, in the wedf approach we face the problem of estimating values that are later used to compute the test statistic. Again, this problem can be circumvented with the help of Monte Carlo simulations. The \(p\) values can be computed as the proportion of simulated MRS model trajectories with the test statistic \(d_n\), see formulas (14) and (18), higher or equal to the value of \(d_n\) obtained from the dataset.
3.3 The Rosenblatt transformation
The Rosenblatt transformation is a very useful and general tool, however, it can be used to test the goodnessoffit of the whole model only. In contrast, the ewedf and the wedf approaches allow for testing the fit in the individual regimes. This can be advantageous in many situations as we additionally obtain the information which regimes are correctly and which are incorrectly specified. Moreover, the ewedf and the wedf approaches yield estimators of the regime and model cdfs, providing a readily available tool for further testing and model building. On the other hand, in case of the Rosenblatt transformation an empirical distribution function can only be constructed for the transformed variables making it hard to be interpreted.
4 Simulations
Parameters of four 2regime MRS models analyzed in the simulation study of Sect. 4
Parameters 
Probabilities  

\(\alpha _1\) 
\(\beta _1\) 
\(\sigma _1^2\) 
\(\alpha _2\) 
\(\beta _2\) 
\(\sigma _2^2\) 
\(p_{11}\) 
\(p_{22}\)  
Sim #1 
10.00 
0.80 
10.0000 
4.00 
– 
0.5000 
0.90 
0.20 
Sim #2 
1.00 
0.80 
1.0000 
3.00 
0.40 
0.5000 
0.60 
0.50 
Sim #3 
0.58 
0.17 
0.0048 
1.28 
– 
0.0034 
0.97 
0.88 
Sim #4 
0.63 
0.81 
0.0038 
0.95 
0.73 
0.0200 
0.96 
0.91 
4.1 Known model parameters
We generate 10,000 trajectories of each of the four 2regime MRS models defined in Table 1. The length of each trajectory is 2,000 observations, which corresponds to 5.5 years of daily data (note, that markets for electricity operate 365 day per year). We apply the ewedf, the wedf and the Rosenblatt transformationbased goodnessoffit tests to each simulated trajectory and then calculate the percentage of rejected hypotheses \(H_0\) at the 5 % significance level. We assume that the model parameters are known. The computation of \(E(X_{t,1}\mathbf{{x}}_t)\) in the wedf approach requires backward recursion until the previous observation from the meanreverting regime is found, see (21). However, as the number of observations is limited, the condition \(P(R_t=1\mathbf{{x}}_t)=1\) might not be fulfilled at all. The estimation scheme requires some approximation or an additional assumption. Here, we assume that for each simulated trajectory the first observation comes from the meanreverting regime.
In the ewedf approach the tested hypothesis says that the state process is known (and coincides with the proposed classification of the observations to the regimes). As a consequence, once the regimes are identified, it is equivalent to the standard edf approach. To test how it performs for an MRS model with a latent state process, we apply it to the simulated trajectories (we first identify the regimes, then test whether the sample is generated from the assumed MRS model).
Percentage of rejected hypotheses \(H_0\) at the \(5\,\%\) significance level calculated from 10,000 simulated trajectories of 2000 observations each of the four 2regime MRS models defined in Table 1
ewedf 
wedf 
Rtrans 
RCM  

First 
Second 
Model 
First 
Second 
Model 
Model  
Sim #1 
0.0641 
0.8648 
0.1242 
0.0527 
0.0540 
0.0481 
0.0542 
4.33 
Sim #2 
0.2244 
0.0798 
0.1194 
0.0499 
0.0527 
0.0314 
0.0493 
12.69 
Sim #3 
0.0586 
0.4190 
0.1060 
0.0817 
0.0458 
0.0702 
0.0902 
12.81 
Sim #4 
0.0519 
0.3206 
0.1025 
0.0505 
0.0389 
0.0341 
0.0553 
29.63 
Finally, to measure the quality of regime classification, we use the regime classification measure (RCM) of Ang and Bekaert (2002), see the last column in Table 2. Since the true regime is a Bernoulli random variable, the RCM statistic is essentially a sample estimate of its variance. The RCM statistic is rescaled so that a value of 0 means perfect regime classification and a value of 100 implies that no information about the regimes is revealed. In our case, the regime classification is very good or good for specification I models used for modeling electricity prices in Sect. 5 (Sim #1 and Sim #3) and good or moderately good for specification II models (Sim #2 and Sim #4). The lower RCM values for type I models also imply that in these models the regimes are better separated than in the respective type II models. Finally, in models whose parameters are estimated from the NEPOOL logprices (Sim #3 and Sim #4) the regimes are less separated than in the other two models.
4.2 Unknown model parameters
 1.
Estimate the parameter vector (\(\hat{\theta }\)) and calculate the test statistic (\(d_n\)) according to formula (9).
 2.
For ‘KS table’type (‘KS tab.’) estimation calculate the \(p\) value using KS test tables and assuming that the sample comes from a model with parameter vector \(\hat{\theta }\).
 3.For ‘MC simulation’type (‘MC sim.’) estimation:
 (a)
simulate \(N=500\) trajectories with parameter vector \(\hat{\theta }\) (these trajectories will be used to compute the estimate of the \(p\) value),
 (b)
for each trajectory \(i=1,\ldots ,N\) estimate the parameter vector (\(\hat{\theta }_i\)) and calculate the test statistic (\(d_n^i\)),
 (c)
calculate the \(p\) value as the proportion of simulated trajectories with the test statistic values higher or equal to \(d_n\), i.e., \(\frac{1}{N}\#\{i: d_n^i\ge d_n\}\).
 (a)
Percentage of rejected hypotheses \(H_0\) at the \(5\,\%\) significance level calculated from 500 simulated trajectories of 2,000 observations each of the models defined in Table 1 with parameters estimated from each sample

Regime 
ewedf 
wedf 
Rtrans  

First 
Second 
Model 
First 
Second 
Model 
Model  
Sim #1 
KS tab. 
0.0360 
0.5860 
0.0540 
0.0420 
0 
0.0400 
0.0400 
MC sim. 
0.0640 
0 
0.0580 
0.0660 
0.0580 
0.0680 
0.0620  
Sim #2 
KS tab. 
0.0040 
0.1940 
0.0080 
0 
0 
0 
0 
MC sim. 
0.0320 
0.0080 
0.0360 
0.0440 
0.0340 
0.0520 
0.0460  
Sim #3 
KS tab. 
0 
0.0140 
0 
0 
0 
0 
0 
MC sim. 
0.0360 
0.0440 
0.0400 
0.0340 
0.0500 
0.0300 
0.0380  
Sim #4 
KS tab. 
0.1400 
0.0020 
0.0120 
0 
0 
0 
0 
MC sim. 
0.0340 
0.0180 
0.0560 
0.0580 
0.0380 
0.0640 
0.0420 
Looking at the test results based on the KS test tables (‘KS tab.’ in Table 3), for the ewedf approach the rejection percentages deviate significantly from the \(5\,\%\) level. On the other hand, for the wedf and the Rosenblatt transformationbased approaches the \(p\) values are overestimated, what results in rejection percentages much lower than the 5 % significance level. Observe that for most of the models none of the tests were rejected. Therefore, if \(p\) values obtained with the wedf or the Rosenblatt transformationbased approaches are close to the significance level, the test may fail to reject a false \(H_0\) hypothesis. This is not the case for the testing approach utilizing Monte Carlo simulations (‘MC sim.’ in Table 3) as the obtained rejection percentages are close to the 5 % significance level. This example clearly shows that the wedf and the Rosenblatt transformation test based on the KS test tables can only be used if it returns a \(p\) value below the significance level (i.e., if it rejects the \(H_0\) hypothesis) or well above the significance level. However, if the obtained \(p\) value is close to the significance level, Monte Carlo simulations should be performed.
4.3 Power of the tests

ARARG1 vs. ARAR The trajectories are simulated from an MRS model defined as:where \(\alpha _1=1\), \(\beta _1=0.8\), \(\sigma _1^2=1\), \(\gamma _1=0\), \(\alpha _2=3\), \(\beta _2=0.4\), \(\sigma _2^2=0.05\), \(\gamma _2=1\), \(p_{11}=0.6\) and \(p_{22}=0.5\). The model is denoted by ARARG1, which indicates that the first regime is driven by an AR(1) process and the second regime by a heteroskedastic autoregressive process with \(\gamma =1\) (i.e., ARG1). We test whether the simulated trajectories can be described by the model defined in Eq. (6), i.e., following specification II, and denoted here by ARAR.$$\begin{aligned} X_{t}=\alpha _{R_t}+ (1\beta _{R_t})X_{t1} +\sigma _{R_t}X_{t1}^{\gamma _i} \epsilon _t, \quad R_t\in \{1,2\}, \end{aligned}$$

ARE vs ARLN The trajectories are simulated from an MRS model following specification I, see (4) and (5), with an exponential distribution in the second regime, i.e., \(F^2 \sim \text{ Exp}(\lambda )\). The model is denoted here by ARE and its parameters are given by: \(\alpha =10\), \(\beta =0.6\), \(\sigma ^2=10\), \(\lambda =30\), \(p_{11}=0.6\) and \(p_{22}=0.5\). We test whether the simulated trajectories can be driven by a model following specification I with a lognormal distribution in the second regime (i.e., ARLN).

CIRLN vs ARG The trajectories are simulated from an MRS model defined as:where \(\alpha _1=1\), \(\beta _1=0.8\), \(\sigma _1^2=0.5\), \(\alpha _2=2\), \(\sigma _2^2=0.5\), \(p_{11}=0.6\) and \(p_{22}=0.5\), i.e., the first regime is a discrete time version of the square root process, also known as the CIR process (Cox et al. 1985), and the second is a lognormal random variable. Hence, the name CIRLN. We test whether the simulated trajectories can be driven by a model following specification I with a Gaussian distribution in the second regime.$$\begin{aligned} X_{t,1}&= \alpha _1+(1\beta _1)X_{t1,1} +\sigma _1 \sqrt{X_{t1,1}} \epsilon _t,\\ X_{t,2}&\sim&LN\left(\alpha _2,\sigma _2^2\right), \end{aligned}$$
Percentages of rejected hypotheses \(H_0\) at the \(5\%\) significance level for the alternative models with parameters estimated for each of the 500 simulated trajectories of \(T=100\), \(500\) or \(2000\) observations
T 
Regime 
ewedf 
wedf 
Rtrans  

First 
Second 
Model 
First 
Second 
Model 
Model  
ARARG1 vs. ARAR  
2,000 
KS tab. 
0.6420 
1.0000 
1.0000 
0.0300 
1.0000 
1.0000 
0.9960 
MC sim. 
0.0520 
1.0000 
1.0000 
0.4000 
1.0000 
1.0000 
1.0000  
500 
KS tab. 
0.1160 
0.6100 
0.5060 
0.0060 
0.4440 
0.0860 
0.0280 
MC sim. 
0.0960 
0.8280 
0.8480 
0.1840 
0.9840 
0.8780 
0.9740  
100 
KS tab. 
0.0160 
0.0120 
0.0120 
0.0020 
0.0100 
0 
0 
MC sim. 
0.1140 
0.1520 
0.2080 
0.0800 
0.2120 
0.1580 
0.2580  
ARE vs. ARLN  
2,000 
KS tab. 
0.0820 
0.9980 
0.9980 
0.0620 
0.9980 
0.7460 
0.9820 
MC sim. 
0.0300 
0.9680 
0.7580 
0.0560 
0.9980 
0.9620 
0.9900  
500 
KS tab. 
0.0660 
0.9780 
0.5700 
0.0660 
0.2140 
0.0580 
0.1060 
MC sim. 
0.0800 
0.2120 
0.2320 
0.0940 
0.8960 
0.2760 
0.4000  
100 
KS tab. 
0.0260 
0.2780 
0.0240 
0.0300 
0.0060 
0.0080 
0.0140 
MC sim. 
0.0540 
0.0180 
0.0880 
0.0960 
0.1620 
0.1260 
0.1480  
CIRLN vs. ARG  
2,000 
KS tab. 
0.9900 
0.9180 
0.9900 
0.9900 
0.9400 
0.9900 
0.9900 
MC sim. 
0.9900 
0.9880 
0.9900 
0.9900 
0.9900 
0.9900 
0.9900  
500 
KS tab. 
0.9940 
0.1600 
0.9900 
0.9940 
0.1880 
0.9860 
0.6840 
MC sim. 
0.9920 
0.7460 
0.9860 
0.9940 
0.8120 
0.9940 
0.9940  
100 
KS tab. 
0.1860 
0.0260 
0.0760 
0.2520 
0.0340 
0.1820 
0.0200 
MC sim. 
0.3200 
0.2440 
0.3760 
0.7440 
0.2920 
0.7920 
0.5560 
Looking at the MC simulation results obtained for the largest samples of \(T=2{,}000\) observations, we can see that in almost all cases the false hypothesis was rejected. The lowest rejection rate for the ewedf approach was 0.7580, for the wedf approach—0.9620 and for the Rosenblatt transformation—0.9900. All three lowest rates were obtained for the challenging ARE vs. ARLN test scenario. However, if the samples are smaller, the power of the tests apparently decreases. The sample size of \(T=100\) observations does not seem to be enough, especially if the degree of regime separation is low, like in the ARE vs. ARLN scenario. This is not the case if the definitions of both regimes are significantly different, as for the CIRLN vs. ARG scenario, for which the power is satisfactory even if \(T=100\). Comparing the ewedf and wedf approaches, we can observe that the latter yields higher on average rejection rates. A comparison of the wedf and the Rosenblatt transformationbased approach does not yield such a clear picture. For the ARARG1 vs. ARAR scenario the power of the Rosenblatt transformationbased approach is higher than of the wedf approach, for the ARE vs. ARLN scenario it is only slightly higher, while for the CIRLN vs. ARG scenario it is the wedf approach which produces higher rejection rates (for small samples).
Percentages of rejected hypotheses \(H_0\) at the \(5\,\%\) significance level for the alternative models with parameters estimated for each of the 500 simulated trajectories of \(T=100\), \(500\) or \(2{,}000\) observations
T 
Regime 
ewedf 
wedf 
Rtrans  

First 
Second 
Model 
First 
Second 
Model 
Model  
ARARG1 vs. ARAR  
2,000 
KS tab. 
0 
0.2300 
0.0080 
0 
0 
0 
0 
MC sim. 
0.0560 
0.0200 
0.0720 
0.0660 
0.0440 
0.0700 
0.0560  
500 
KS tab. 
0.0080 
0.0620 
0.0160 
0 
0 
0 
0 
MC sim. 
0.0260 
0.0020 
0.0520 
0.0480 
0.0160 
0.0560 
0.0460  
100 
KS tab. 
0.0120 
0.0380 
0.0040 
0 
0 
0 
0 
MC sim. 
0.0280 
0.0320 
0.0840 
0.0380 
0.0040 
0.0560 
0.0660  
ARE vs. ARLN  
2,000 
KS tab. 
0.9980 
0.7420 
0.9980 
0.9980 
0.7700 
0.9980 
0.9980 
MC sim. 
0.9980 
0.4500 
0.9980 
0.9980 
0.8440 
0.9980 
0.9980  
500 
KS tab. 
0.8600 
0.2000 
0.8600 
0.8640 
0.1620 
0.8640 
0.8600 
MC sim. 
0.8740 
0.1620 
0.8740 
0.8740 
0.3480 
0.8760 
0.8760  
100 
KS tab. 
0.3200 
0.0020 
0.3200 
0.3200 
0.0020 
0.3200 
0.3140 
MC sim. 
0.3560 
0.0620 
0.3660 
0.3520 
0.0380 
0.3600 
0.3560  
CIRLN vs. ARG  
2,000 
KS tab. 
0.0480 
0.4160 
0.0860 
0.0660 
0.0260 
0.0800 
0.0540 
MC sim. 
0.0940 
0.1200 
0.0960 
0.0680 
0.1180 
0.0800 
0.0840  
500 
KS tab. 
0.0560 
0.0540 
0.0500 
0.0820 
0.0120 
0.0540 
0.0580 
MC sim. 
0.0900 
0.0660 
0.1060 
0.1080 
0.0780 
0.1080 
0.1200  
100 
KS tab. 
0.0100 
0.0160 
0.0100 
0.0180 
0.0100 
0.0080 
0.0160 
MC sim. 
0.0440 
0.0540 
0.1360 
0.0740 
0.0700 
0.1540 
0.1520 
5 Application to electricity spot prices

a 2regime MRS model with meanreverting, see (4), base regime (\(R_t=1\)) and i.i.d. shifted lognormally distributed spikes (\(R_t=2\)) or

a 3regime MRS model with meanreverting, see (4), base regime (\(R_t=1\)), i.i.d. shifted lognormally distributed spikes (\(R_t=2\)) and i.i.d. drops (\(R_t=3\)) distributed according to the inverted shifted lognormal law.
Furthermore recall, that \(X\) follows the shifted lognormal law or inverted shifted lognormal law if \(\log (Xq)\), respectively \(\log (qX)\), has a Gaussian distribution. The cutoff level \(q\) can be different for the spike and the drop regime, however, here—motivated by the results of Janczura and Weron (2013)—we set it to the first and the third quartile of the dataset for drops and spikes, respectively. Using shifted lognormal distributions allows to increase the degree of separation as measured by RCM and, hence, increase the power of the tests (see Sect. 4). For instance, the 2regime model now yields an RCM of 4.79, compared to 12.81 for Sim #3 in Table 2. It also leads to more fundamentally justified models since electricity price spikes are generally connected with scheduling units with higher marginal costs (like gas turbines, see e.g., Eydeland and Wolyniec 2012; Weron 2006).
Finally note, that such simple onefactor models may not be complex enough to address all features of electricity prices. In particular, the electricity forward prices implied by these spot price models exhibit the socalled Samuelson effect (i.e., a decrease in volatility with increasing time to maturity; for the considered models the volatility scales as \(e^{\beta (Tt)}\)), but the rate of decrease is completely determined by the speed of meanreversion \(\beta \) (Janczura and Weron 2012). However, the rate of decrease should be large only for maturities up to a year (Kiesel et al. 2009). Perhaps, incorporating another stochastic factor would lead to a more realistic forward price curve.
Following Weron (2009) the deseasonalization is conducted in three steps; for a thorough study of modeling seasonal components in electricity spot prices we refer to Janczura et al. (2012). First, the longterm seasonal component (LTSC) \(T_t\) is estimated from daily spot prices \(P_t\) using a wavelet filtersmoother of order 6. A single nonparametric LTSC is used here to represent the longterm nonperiodic fuel price levels, the changing climate/consumption conditions throughout the years and strategic bidding practices. As shown by Janczura and Weron (2010), the waveletestimated LTSC pretty well reflects the ‘average’ fuel price level, understood as a combination of natural gas, crude oil and coal prices; see also Eydeland and Wolyniec (2012) and Karakatsani and Bunn (2010) for a treatment of fundamental and behavioral drivers of electricity prices. On the other hand, as discussed recently in Janczura and Weron (2012), the use of the waveletbased LTSC is somewhat controversial. Predicting it beyond the next few weeks is a difficult task, because individual wavelet functions are quite localized in time or (more generally) in space. Preliminary research suggests, however, that despite this feature the waveletbased LTSC can be extrapolated into the future yielding a better onaverage prediction of the level of future spot prices than an extrapolation of a sinusoidal LTSC (Nowotarski et al. 2011).
Parameters of the 2 and 3regime MRS models with meanreverting base regime and independent spikes and drops driven by shifted lognormal laws fitted to the deseasonalized NEPOOL logprices
Model 
Base regime 
Spike regime 
Drop regime 
Probabilities  

\(\alpha \) 
\(\beta \) 
\(\sigma ^2\) 
\(\alpha _2\) 
\(s_2^2\) 
\(\alpha _3\) 
\(s_3^2\) 
\(p_{11}\) 
\(p_{22}\) 
\(p_{33}\)  
2Regime 
0.71 
0.21 
0.0060 
\(1.57\) 
0.32 
– 
– 
0.98 
0.74 
– 
3Regime 
0.99 
0.29 
0.0053 
\(1.65\) 
0.33 
\(1.93\) 
0.19 
0.96 
0.76 
0.89 
For both analyzed MRS models, tests based on the ewedf, the wedf and the Rosenblatt transformation are performed. The \(p\) values in Table 7 are reported both for the standard approach utilizing the KS test tables (which generally leads to overestimated \(p\) values) and the much slower but more accurate Monte Carlo setup. The testing procedure in the Monte Carlo case is analogous to the one used in the simulation study, see Sect. 4.2 for a detailed description. Again, in order to verify the ‘whole model’ goodnessoffit, we transform the spike and drop regime observations so that both samples are \(N(0,1)\)distributed.
The \(p\) values of the KS test based on the ewedf, the wedf and the Rosenblatt transformation (Rtrans) for the 2 and 3regime MRS models of the deseasonalized NEPOOL logprices
Regime 
ewedf 
wedf 
Rtrans  

Base 
Spike 
Drop 
Model 
Base 
Spike 
Drop 
Model 
Model  
2Regime model  
KS tab. 
0.15 
0.54 
– 
0.2170 
0.05 
0.87 
– 
0.0960 
0.2850 
MC sim. 
0.00 
0.34 
– 
0.0240 
0.00 
0.50 
– 
0.0020 
0.0470 
3Regime model  
KS tab. 
0.42 
0.58 
0.99 
0.3950 
0.29 
0.78 
0.99 
0.3550 
0.3900 
MC sim. 
0.08 
0.26 
0.91 
0.0780 
0.08 
0.35 
0.96 
0.0820 
0.1190 
6 Conclusions
While most of the electricity spot price models proposed in the literature are elegant, their fit to empirical data has either been not examined thoroughly or the signs of a bad fit ignored. As the empirical study of Sect. 5 has shown, even reasonably looking and popular models should be carefully tested before they are put to use in trading or risk management departments. The goodnessoffit wedfbased test introduced in Sect. 3.2.2 provides an efficient tool for accepting or rejecting a given Markov regimeswitching (MRS) model for a particular data set. While its performance (including power; see Sect. 4.3) is similar to that of the Rosenblatt transformationbased approach, it provides an edge over the latter by yielding \(p\) values for the individual regimes. For instance, this allows to observe that in the 3regime model the worst fit is obtained for the base regime. Perhaps the simple AR(1) structure is not enough to model the complex dynamics of electricity spot prices in the relatively calm, nonspiky periods.
However, in this paper we have not restricted ourselves to MRS models but pursued a more general goal. Namely, we have proposed a goodnessoffit testing scheme for the marginal distribution of regimeswitching models, including variants with an observable and with a latent state process. For both specifications we have described the testing procedure. The models with a latent state process (i.e., MRS models) required the introduction of the concept of the weighted empirical distribution function (wedf) and a generalization of the Kolmogorov–Smirnov test to yield an efficient testing tool.
We have focused on two commonly used specifications of regimeswitching models in the energy economics literature—one with dependent autoregressive states and a second one with independent autoregressive and i.i.d. regimes. Nonetheless, the proposed approach can be easily applied to other specifications of regimeswitching models (for instance, to 3regime models with heteroscedastic base regime dynamics; see Janczura and Weron 2010). Very likely it can be also extended to other goodnessoffit edftype tests, like the Anderson–Darling. As the latter puts more weight to the observations in the tails of the distribution than the Kolmogorov–Smirnov test, it might be more discriminatory and provide a better testing tool for extremely spiky data. Future work will be conducted in this direction.
Finally note, that a good insample fit does not necessarily imply a good forecast behavior. Although Kosater and Mosler (2006) found for German electricity price data that for long run point forecasts (30–80 days ahead) an MRS model with regimes driven by two AR(1) processes was slightly more accurate than a simple AR(1) model, for shorter time horizons both model classes performed alike. It remains an open question how do the MRS models fitted to NEPOOL logprices in Sect. 5 perform in terms of forecasting. The adequacy of MRS models for forecasting in general has been questioned by Bessec and Bouabdallah (2005). However, as Weron and Misiorek (2008) have shown, regimeswitching models may behave better than their linear competitors in volatile periods. They might also have an edge in density forecasts, but this has to be verified yet.
Acknowledgments
We thank two anonymous reviewers for their comments and suggested improvements. This paper has also benefited from conversations with the participants of the DStatG 2010 Annual Meeting, the Trondheim Summer 2011 Energy Workshop, the 2011 WPI Conference in Energy Finance, the Energy Finance Christmas Workshop (EFC11), the 9th International Conference ‘European Energy Market’ (EEM12) and the seminars at Macquarie University, National University of Singapore and University of Poznań. This work was supported by funds from the National Science Centre (NCN, Poland) through grant no. 2011/01/B/HS4/01077.
Open Access
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
Appendix
Proof
Proof
[Theorem 1] First, note that \(F(x)\in \{0, 1\}\) implies \(F_n(x)=F(x)\) and \(\sup _{x\in \mathbb R }F_n(x)F(x)=\sup _{x\in D }F_n(x)F(x)\), where \(D=\mathbb R \backslash \{x:F(x)=0 \vee F(x)=1\}\). Therefore, in the following we will limit ourselves to the case \(0<F(x)<1\).
Proof