1 Introduction

The future energy system will have an important impact on several societal goals such as climate change mitigation, energy security, and air pollution. There are several types of energy system models that are used to provide policy support for the transition towards more renewable energy [1]. Some are pure dispatch models, i.e. they focus on a fixed set of generation-, storage-, and transmission capacities. Some, however, also take into account the transformation of the energy system, i.e. they represent investments in generation-, transmission- and storage capacity. These may include only the electricity sector (as in [2,3,4]) or they may include several sectors (as in [5,6,7]. Here, we focus on energy system models with investment and where electricity generation is important and denote them Capacity Expansion Models (CEMs). The last few years have seen an increasing number of studies using such models to investigate power systems in which the share of variable renewable energy (VRE) may exceed 50%, and sometimes even reach 100% in a large area such as Europe or the US [8,9,10,11,12,13,14,15,16,17]. Such studies support policies, either directly [18], or in order to contribute more general knowledge about e.g. cost [19], the dominating generation technologies [19], or about optimal national strategies to transition into a CO2 neutral future energy system [3]. The models underpinning these studies are large and complex and have therefore been subject to simplification [20]. Although there are now several models which use hourly resolution for an entire year [21,22,23,24], there are also models that have a reduced temporal representation, e.g. by representing the entire year using a handful of days, such as in references [12, 18, 25] or by the method known as “time-slicing”, as in reference [6].

There is a growing literature that investigates how to best perform this reduction while still preserving an output similar to that of a model with a year’s worth of hourly data. The focus in the temporal reduction literature has partly been to compare several methods for finding representative days against each other [26,27,28], and partly to compare the outputs using reduced time series to outputs with a more extensive temporal representation [26,27,28,29,30]. Regarding the latter branch, the comparison has been done regarding different quantities/metrics: Nahmmacher et al. [30] compared the error of the representative days method for total cost and total VRE capacity in the model EU-LIMES, and found that these two quantities converged at 25 representative days. Reichenberg et al. [26] came to a similar conclusion in a simple 1-node model. Merrick [29] used the L1 metric, which takes into account errors for all the constituents of the objective function, and found that around 150 representative days were necessary to get a prediction within 10%. Pineda and Morales [28] compared wind capacity and cost and found that the cost error with their method was around 6% for an amount of timesteps the equivalent of 28 days. Gonzato et al. [31] investigated the error specifically for storage capacity and found that some techniques were appropriate for storage, but yet sometimes would be incorrect by a factor of two. The literature is thus focused on how many time steps that are necessary in order to reasonably duplicate the results of an hourly model, yet the results differ between studies. Moreover, the studies do not systematically investigate if different kinds of questions relating to policy support, such as questions regarding system cost, capacity mix or regional strategies, potentially require diverse levels of detail in the temporal representation. In many papers that aim at providing policy support, different numbers of representative days are used [12, 18, 25], and it is not clear whether the time reduction practice in these actually significantly impose distortions in the results and conclusion. Thus, the literature on temporal reduction methods does not sufficiently address the question of whether the number of time steps required to achieve a close enough estimate differ depending on whether it is e.g. system cost or national strategies that is to be assessed. On the other side, model studies, have not considered the possible error due to a reduced temporal representation, including whether this error may differ depending on the quantities (e.g. system cost or allocation of capacities on a national level) crucial to the research question. In this paper, we seek to address this, by specifically targeting the error introduced by temporal reduction regarding the diverse policy-relevant output from a CEM. Specifically, we address the research question:

What is the relationship between number of time steps and error estimate of

  • System cost

  • Total capacity mix

  • Regional capacity mix

The purpose of our paper is thus meta-methodological: to investigate whether the way Capacity Expansion Models with reduced temporal dimension are used is appropriate to provide intended insights. Using the results from our quantitative analysis we discuss the validity of results from papers using energy system models for policy support.

We explore a large range of investment costs to account for a variety of system configurations. We investigate model input with between 24 and 4800 time steps. (i.e. a reduction of a factor of between 1.8 and 365, compared to the hourly resolution of 8760 time steps). The time reduction method applied is the representative days method first outline in Nahmmacher [30], which is (i) compatible with most models using only small changes and (ii) versions of which are widely used in CEMs [12, 18, 25].

2 Materials and methods

We compare the output from a CEM for sets with varying temporal representation drawn from a complete year-long hourly data set of electricity demand and wind and solar output. The results from running the CEM with these differently sized sets, sampled with a method based on clustering, are compared to results from running the CEM using data from the entire year (8760 time steps). The model results are analyzed with three different types of metrics, further described below.

2.1 Sampling method

We use a method based on clustering to find representative days and weight them. The method is similar to that introduced by Nahmmacher et al. [30], and further clarified in Pineda et al. [28]. See Reichenberg et al. [26] for a general discussion of different methods for simplifying the temporal representation in CEMs. Hoffmann et al. [32] also offer an overview of the procedure to find representative periods.

The representative days are found as follows, with the specific choices made here in bold:

  1. 1.

    Define demand, wind, and solar time series to represent each region: Select the wind sites with annual capacity factor > 25% and solar sites with > 15%; average over these sites.

  2. 2.

    Normalize the time series \({S}_{rt}\) of each region so that \(\underset{t}{\mathrm{max}(\{{S}_{\mathit{rt}}\}})=1, \forall t\). This means that each time series (wind, solar, demand) reaches a maximum of 1 for all the regions.

  3. 3.

    Choose a time period for which data will be consecutive. Here, we use one-day periods, testing five-day periods in the sensitivity analysis.

  4. 4.

    Form vectors of the time series so that each vector consists of data for the period chosen. This will consist of ordered (normalized according to step 2) wind, solar, and demand (here labeled “resources”) data for all regions and the period length chosen. In the application here, the data set consists of a year of hourly data, the period is 1 day, and there are 8 model regions, so that there are 365 vectors, each of which consists of (#resources)*(#representative days)*(#time steps per day)*(#regions) = 3*1*24*10 = 720 elements.

  5. 5.

    Cluster the vectors into the desired number of clusters (here using hierarchical clustering with between 1 and 200 clusters for the case of one-day periods.).

  6. 6.

    Find the cluster centroid and pick the vector closest to the centroid as the cluster representative. Weight the vector according to the cluster size.

This procedure results in a subset of the original time steps,\(T^{\prime}\in T\), for which there are weights, \(, {\omega }_{t},\) assigned. The sum of the weight equals the original number of time steps,\(\sum {\omega }_{t}, =8760, t{^{\prime}}\in T{^{\prime}}\).

2.2 Model

We use a stylized CEM for Europe with 8 regions, see [33] and Fig. 1 in the Supplementary material. In general, the model displays the core features of CEMs (investment in several generation technologies including wind and solar, transmission expansion option and storage options), while being simplified in terms of storage options and other technical features. The implications of the stylized approach are further elaborated on in the Sect. 4. We use a greenfield approach (i.e. no existing transmission-, storage- and generation capacity), and investments are done overnight. The mathematical formulation of the model and a map of the region boundaries may be found in the supplementary material. The input data and details are explained further in the supplementary material.

Fig. 1
figure 1

Deviation from the benchmark value for system cost, using 1–200 representative days (24-h periods). Each line represents one technology investment cost combination, see the Method section and Table 1

It is a zonal model, where resulting transmission capacities are interpreted as Net Transfer Capacity (NTC) values, or as capacities in an HVDC (High Voltage Direct Current) network. This model treats electricity as it would other goods, i.e., without taking Kirchhoff’s laws into account. The decision variables (see Sect. 1.3 in the Supplementary material) are:

  • investments in generation technologies: wind, solar, and three thermal technologies;

  • dispatch of generation technologies;

  • investments in and dispatch of generic storage;

  • investments in transmission and dispatch of trade.

The model minimizes annualized investment and operation costs for one year (see the formulation of the objective function, Sect. 1.4 in the Supplementary material). The model does not include hydropower, nor does it allow investment in offshore wind power. These are excluded because hydropower may obscure the effects on investment in storage, and offshore wind power would further differentiate regions. However, offshore wind power would not introduce new dynamics that would alter the qualitative results.

The model is run in:

  • a benchmark version, with a full year of hourly data, i.e., 8760 time steps;

  • versions with fewer time steps, in which representative periods of 1 day (or periods of 5 days for the sensitivity analysis) are selected.

The reduction is explored for temporal reductions of a factor of between 1.8 (4800 time steps) and 365 (24 time steps). In addition, the model is run with a constraint on minimum VRE penetration of 90%, thus creating systems with a heavy dependency on VRE.

The model formulation of the benchmark and reduced models can be found in Sect. 1 of the Supplementary material, but here we highlight some aspects.

The dispatch variables for generation, export and storage represent the generation during one hour, and are related to the capacity variables in the same way as for a full time model:

$${p}_{i,r,t}\le {n}_{i,r}, \forall i\in {I}_{disp}\cup {I}_{VRE}, t\in T{^{\prime}}, r\in R$$
(1)
$${e}_{r,r{^{\prime}},t}\le {a}_{r,r},\forall t\in {T}^{^{\prime}},r, r{^{\prime}}\in R$$
(2)
$${l}_{irt}\le {c}_{ir}, i \in {I}_{stor}, \forall t\in T{^{\prime}}, r\in R$$
(3)

where \({p}_{i,r,t}\) is the electricity generated by technology \(\mathrm{i}\) in region \(\mathrm{r}\) during time step \(\mathrm{t}\); \({n}_{i,r}\) is the capacity of technology \(\mathrm{i}\) in region \(\mathrm{r}\); \({e}_{r{r}^{\mathrm{^{\prime}}}t}\) is the energy exported from region \(\mathrm{r}\) to region \({r}^{\mathrm{^{\prime}}}\) during time step \(\mathrm{t}\); \({a}_{r{r}^{\mathrm{^{\prime}}}}\) is the transmission capacity between regions \(\mathrm{r}\) and \({r}^{\mathrm{^{\prime}}}\), \({l}_{irt}\) is the storage reservoir level of storage type \(I\) in region \(\mathrm{r}\) at time step \(t\). Note that there is no capacity limit ([MW]) on storage, and hence there is no constraint on the rate of charge/discharge. Hence, the storage is not accompanied by a “time” to fill the storage, such as is assumed in some other studies, e.g. [4], but rather only limited by the energy (in [MWh]) that may be stored.

The storage reservoir level is the initial storage level plus the sum of charge and discharge during the period, with the set of time steps \({T{^{\prime}}}_{p}, p \in P\), where P is the set of periods.

$${l_{ir1}} + \mathop \sum \limits_{{t^\prime } = 1}^t (1 - {\lambda _i}){m_{irt'}} - {o_{i,r,t'}} = {l_{i,r,t}},\forall i \in {I_{stor}},t \in {T^\prime }_p,r \in R$$
(4)

where \({\uplambda }_{i}\) is the loss factor for round-trip storage operation; \({m}_{i,r,t}\) is the charge of storage;\({ o}_{i,r,t{^{\prime}}}\) is the discharge of storage. Equation (4) is the storage balance equation. Each period has \(\tau\) time steps (e.g. \(\tau =24\) for one-day periods), and the storage level is constrained so that it is the same in the last time step as in the first:

$${{l}_{i,r,\tau }=l}_{i,r,1}, \forall i \in {I}_{stor}, r\in R$$
(5)

The energy balances are then formulated for every time step by summing generation, import, and export and constraining that sum to be greater or equal to the demand in that time step:

$$\mathop {\sum }\limits_{i\in {I}_{disp}\cup {I}_{VRE}}{p}_{irt}+\mathop {\sum }\limits_{r}{\left(1-{\lambda }_{r{r}^{{^{\prime}}}}\right)e}_{{r}^{{^{\prime}}}rt}-\mathop {\sum }\limits_{r}{e}_{r{r}^{{^{\prime}}}t}+\mathop {\sum }\limits_{i\in {I}_{stor}}{\left(1-{\lambda }_{i}\right)o}_{irt}-{m}_{irt}\ge {\Delta }_{rt}\forall r,{r}^{{^{\prime}}}\in R,{t}\in T$$
(6)

where \({\Delta }_{rt}\) is the demand (inelastic; parameter value taken from statistics) in hour \(\mathrm{t}\) for region \(\mathrm{r}\).

In the objective function, the weights, \({\omega }_{t}\), are part of the running costs for each time step:

$$\mathop {\min }\limits_{{i,r,t}} \mathop \sum \limits_{{i \in I}} \mathop \sum \limits_{{t \in T'}} \mathop \sum \limits_{{r,r' \in R}} \kappa _{i} n_{{i,r}} + {\text{~}}\omega _{t} \nu _{i} p_{{i,r,t}} + {\text{~}}0.5~\kappa _{a} \theta _{{r,r'}} a_{{r,r'}}$$
(7)

where \({\upomega }_{t}\) 1 is the weight for time step \(\mathrm{t}\); \({\kappa }_{i}\) denote the annualized investment costs for generation- and storage capacity; \({\kappa }_{a}\) the annualized investment cost ([€/MW km]) for transmission; and \({\nu }_{i}\) are the running costs.

2.3 Data

The input is regional demand and wind and solar time series and is generated using the methodology described in Mattsson et al. [33]. Table 1 shows the fixed- and operational costs used as input to the model. The costs are annualized into fixed costs ([€/MWh * yr]) using a social discount rate of 5% and running costs ([€/MWh]), which is comprised of fuel costs and O&M costs. The specific costs are less relevant for this study, since the focus is not to describe or predict any energy system in particular, but rather to compare methodological choices. The costs for wind, solar PV and batteries were varied to generate a total of 27 cost combinations. This serves the purpose of generating an ensemble of system configurations, in order to explore more possible outcomes regarding deviations from the benchmark results.

Table 1 The costs and technical data that are input to the model

2.4 Sensitivity analysis

The period length, which was set to 1 day (24 h) in the base case, was prolonged to five days (120 h) for the sensitivity analysis. Using a longer period would reduce the error introduced by constraining the storage level to be the same at the beginning and the end of a period (Eq. 5). By extending the period from one to five days, this constraint limits storage operation to a lesser extent, and may reveal whether a large part of the discrepancy is in fact due to this constraint, at the same time as providing a basis for a decision for future modelers regarding the length of the period (e.g. one day, one week etc.).

3 Results

This paper focuses on the outputs system cost, total capacity mix, and regional capacity mix. The reason for the breakdown into these three categories is that energy system studies are often focused on any or all of these, see the Discussion section of this paper. As it turns out, the accuracy that may be achieved, given a certain number of representative periods, differ substantially between these three categories. The accuracy is measured as deviation (in percent) from the value obtained by the benchmark version (8760 chronological time steps) of the model. The results were computed for several cost combinations, in order to explore a possibly wider range of deviations from benchmark.

3.1 System cost

Figure 1 shows the deviation from the benchmark system cost (sample system cost/benchmark system cost) induced by using the sample representative days method for between 1 and 200 days. Each colored line represents one of 27 cost scenarios. As the figure shows, the system cost deviation results for the scenarios align rather well, especially for 40 representative days and over. This means that the accuracy of the time reduction depends only to a very small degree on the assumptions on technology costs. An accuracy of 10% is achieved already at ~ 10 representative days, while ~ 50 days and above ensure a system cost discrepancy of a mere few percent.

The results demonstrate that models with a reduced temporal representation may over- or underestimate system cost, but underestimates seem more prevalent, especially for cases with few representative days. This is likely due to that the range of variability of generation is represented more accurately with more days, and that greater variability incurs additional costs, such as investment costs for back-up capacity.

3.2 Total capacity mix

Figure 2 shows the deviation from benchmark capacity for a) wind power, b) solar power, c) battery storage and d) transmission capacity at a penetration level of VRE of 90%. The capacities are the totals for all of Europe.

Fig. 2
figure 2

Deviation from the benchmark value capacity for total a wind power, b solar power, c battery storage and d transmission capacity. The figures show deviation from the benchmark model results, where 1 means that there is no deviation. Each line represents one technology investment cost combination, see the Method section and Table 1

The first observation is that the capacities display greater deviation from benchmark than does cost. While < 10% deviation for cost was achieved at ~ 10 days, generation capacities (wind, solar) do not stabilize at < 10% deviation until 80 days or more. Transmission capacity, does not stabilize at < 10% deviation even for 200 representative days, which was the maximum number of days tried here. With the previously proposed [26, 30] sufficient number of representative days, ~ 25 days, the errors in total capacity are ~ 10% (wind), ~ 25% (solar, transmission, storage), see Fig. 2.

The second observation is that capacities are not equally volatile: solar capacity deviates by more than wind, and storage- and transmission capacity by even more. The fact that wind capacity results are more stable than are solar results may be because the total wind capacity is greater, so for some cost combinations, the solar capacity is rather low, and thus a small nominal change induces a larger percentage change for solar than it does for wind.

3.3 Regional capacity mix

This section displays results for individual regions, focusing on wind-, solar- and storage capacities, since these, together with transmission, dominate the system with 90% VRE generation.

Figure 3 shows the deviation from the benchmark regional capacity for a) wind, b) solar and c) storage. In order not to give importance to negligible amounts of capacity, data points for regions where the generation wind/solar contributes less than 10% of the total demand are discarded. Similarly, data points for regions with storage capacity less than the equivalent of one hour’s regional demand are discarded. For each model version (1, 2, … 200 representative days), there are thus a maximum 216 data points (27 cost combinations times eight regions) for each technology.

Fig. 3
figure 3

The deviation from benchmark for regional (a) wind power, (b) solar power and (c) storage capacity. The figures show deviation from the benchmark model results, where 1 means that there is no deviation. Each dot represents one data point, where a data point is the deviation for one region in one of the 27 cost scenarios

All three capacity types show instances of large deviations, of more than 40%, all the way up to 200 representative days. For 25 days or more, the deviation of regional capacities may be above 200%.

The supplementary material contains similar figures (Figs. 3 and 4 in the Supplementary material), where only data points for which the technology generates more than 40% of the regional demand of wind and solar. Even thus excluding all but the data points representing very substantial parts of the regions’ capacity mix, the regional capacity outlay for reduced temporal models is highly erroneous. The overall picture conveyed here is that regional capacities from models with reduced temporal resolution should not be trusted to be correct.

3.4 Sensitivity analysis

The model was also run with five-day periods (instead of one-day periods as in the base case). This sensitivity analysis is of interest because it may reveal the error introduced by the storage formulation when the cost of storage is such that it may be assumed to be a vital part of the modeled power system. The results show that the deviation in system cost, for the same number of hours, is slightly higher when using five-day periods, compared to one-day periods (see Fig. 2 in the Supplementary material). Thus, for the range of costs represented in this investigation, it seems that the error introduced by the constraint on the storage operation to 24 h is smaller than that induced by forcing (five) consecutive days, and thus limiting the representation of variability.

3.5 Sumary of results

Table 2 summarizes the results by measuring the deviation from benchmark results at 25 representative days, as well as the number of days necessary to predict quantities within a 10% deviation from benchmark results. The table shows the worst case, i.e., the deviation for the penetration level with the greatest deviation from benchmark. The comparison shows that models with a reduced temporal representation of 25–100 days predict system cost and total VRE capacities within a range of 10–20% of benchmark. Regional values for wind, solar, transmission, and storage capacities are poorly predicted by models with reduced temporal representation. Using a model with the proposed number of days, ~ 25 days [30], induces errors so large that little may be said about regional capacities.

Table 2 Summary of results for the three categories discussed here: system cost, total capacity mix, and regional capacity mix

4 Discussion

The main finding of this paper is that the amount of representative days needed for energy system models to provide accurate results greatly differs depending on the focus of the study: system cost is well represented with few days, while optimal regional policy is highly volatile under a temporal reduction. We have shown this by reducing the temporal dimension and measuring the error regarding system cost, total capacity and regional capacity. We believe the distinction between these quantities (system cost, total capacity mix, regional capacity mix), and the difference regarding their accuracy under a reduction of the temporal dimension, to be an important one, and one that has not been properly addressed in the previous literature.

Our results indicate that using a model with reduced temporal representation is a valid method to investigate questions relating to system cost. The fact that the system cost in reduced time models comes close (15–20% for 4–10 days, 2–5% for 30 days or more) to benchmark results should not be surprising: As long as there is some representation of variability, the model is likely to capture the fact that serving demand with VRE generation requires additional flexibility: transmission, storage, flexible thermal, which all generate higher costs than the mere technological LCOE of VRE. In contrast, models using a temporal representation based on averaging may display very large deviations from benchmark even for the system cost [26]. Studies that use the representative days method with at least 30 days and focus on system cost may thus be relatively sure that their estimates are in the right approximate range. However, using 48 time steps as in Knopf et al. [18] or 6 days, as in Osorio et al. [25], may induce much larger errors. This is especially the case, since many studies compare decarbonized scenarios with BaU scenarios, where a lot of thermal generation remain in the mix. Since it is the irregular variation on the generation side, i.e. large share of VRE, that incurs the errors due to temporal representation, this may bias studies to assess renewable scenarios as being less costly than they actually are.

The results regarding both global (sum for all regions) and regional capacities should give pause to modelers using models with reduced temporal representation: For the total capacity mix, we show that the deviation from benchmark results may be large (> 20%) for solar, storage and transmission capacity, at 25 representative days. Such a discrepancy is larger than, yet rather close to, the estimate in [28, 30, 35]. For fewer number of days, such as in references [25] (6 days), [12] (4 days) and [18] (48 times steps), [36] (12 days), [37] (12 days) the discrepancy may be 50% or more. Yet, these studies typically do not mention the uncertainty range due to the temporal representation.

The regional capacity mix has, to our knowledge, not been the topic of any previous paper on temporal reduction. We show that it displays large deviations from benchmark results, even for models with ~ 100 representative days. This result is more similar to those in reference [29], where the equivalent of ~ 150 days was necessary in order to come within 10% deviation for the statistical measure used. Even though the measure employed by Merrick [29] differs from the regional capacity mix in this study (Merrick’s test model has only one node and thus there is only one region), they both point to the possibility that CEMs with reduced temporal representation are not fit to use for all types of analysis. Thus, one may view the regional capacity mix investigated in this paper as but one example of a volatile output, but volatile outputs are likely not limited to just the regional capacity mix. Based on the findings of this study, in order to at least eliminate the time reduction as a source of volatility we may lean towards using models with full time resolution (as done in e.g. [21,22,23]). In addition to the better representation of variability provided by such models, it also has the chronological required to represent long-term storage.

In addition, a temporal reduction is clearly one perturbation which is the source of volatility in output, but there may be other. In fact, there is reason to believe that the large variation in regional capacity mixes, is due to that there are simply many regional capacity configurations that give rise to near optimal solutions (flat objective function around optimum). If such is the case, many types of perturbations may give rise to a different optimal configuration. Reducing the time dimension may then be viewed as one type of perturbation, but there may be other perturbations that also yield solutions that are near optimal, yet quite different in terms of system configuration. This flatness around the optimum is investigated in more depth in Neumann and Brown [38]. Another example is Zeyringer et al. [39] who investigated the effect from using different years for the wind- and solar input, and found that system cost is impacted but a few percent, while the optimal regional capacity mix displayed a very large range between the years. A systematic investigation of the effect of different perturbations thus seems essential to find robust regional strategies for renewable power systems. While outside the scope of this study, this issue has been explored for several energy system models [40].

4.1 Limitations

This study was performed with a model that is stylized compared to most models used for policy support. This amounts to both the number of technologies and their level of detail, as well as other constraints on operation, self-sufficiency, policy options etc. that a modeler may choose to include in a study. Specifically, the storage option was defined in terms of the energy it would be able to store ([MWh]), but not by the speed at which it could charge or discharge ([MW]). Although the simplifications were aimed at making the investigation more general than it would have been had we used an existing model, they may also have contributed to the unstable nature of the results under a reduction of the temporal dimension. An example of a feature that a real model study may have included is the EU 2030 national targets, where each country may pledge e.g. a certain share of renewable electricity. This is explicitly a constraint on the regional capacity, which of course reduces the effect of a temporal resolution on the same quantity. Similar arguments may be constructed for other constraints. Thus, our model results may indicate an upper limit for the effect on results of a reduction of the temporal dimension and the exact quantification may not be applicable to the general energy system study. Yet, we believe that the tendency shown here, i.e. that system cost is fairly stable under a reduction of the temporal dimension, while the capacity mix and, especially, the regional capacity mix, are considerably more unstable, nevertheless holds.

Regarding the sampling method, this study used hierarchical clustering and tested two period lengths under which the hours were consecutive, namely 1 day (24 consecutive hours) and five-day periods (120 consecutive hours). Reichenberg et al. [26] showed that a clustering method yielded considerably less deviation compared to random selection of hours. References [41, 42] showed that hierarchical clustering and other clustering methods (k-means) yielded similar results even though Teichgraeber et al. [43] found more diverse results with respect to a wider range of clustering methods, however for a small number (up to nine) clusters. At the same time, Marcy et al. [44] found that clustering techniques were superior to other techniques for finding representative periods. The question is then if another clustering method would have yielded results with a smaller error than our choice of hierarchical clustering. A further investigation of clustering methods was, however, out of the scope of this paper and will be left for future research.

5 Conclusions

This study assesses the accuracy of CEMs with reduced temporal representation in terms of three different quantities: system cost, total capacity mix, and regional capacity mix. It does so by comparing results from models with reduced temporal representation using 1 to 200 one-day periods, to results from a benchmark model that uses a full year’s worth of data.

We show that the number of representative days necessary to use CEMs with reduced temporal representation to predict system cost, capacity mix, and regional capacity mix differs for these three quantities.

  • For system costs, the deviation from benchmark results can be kept below 5% by using ~ 25 days.

  • For the total capacity of the most important components of a renewable system (wind, solar, storage, transmission) deviations of less than 5% requires between 65 and 200 days. To guarantee no more than 20% deviation from benchmark results, ~ 50 days are required.

  • The regional capacity mix is highly erroneous with frequent values deviating more than a factor of 2 from the benchmark all the way up to 200 representative days.

Hence, CEMs with reduced temporal representation of ~ 25 days may be well-suited to assess, e.g., the cost of a future renewable power system, since 5% cost deviation is small given the impact on results due to other uncertainties, such as technology development, social acceptance and political feasibility. However, such a temporal representation seems less suited to detail the optimal regional allocation of generation technologies in such a system. Still, there are a fair number of studies that have used reduced temporal resolution and analyzed quantities that we have shown they were not apt to analyze. Therefore, researchers and policy makers should exercise care when drawing conclusions regarding a regional capacity mix from CEMs with reduced temporal representation.

We therefore recommend that researchers working with energy system models exercise caution in performing any detailed analysis of the capacity mix, whether regional or total. Regarding the temporal resolution, we recommend using CEMs covering a full year of operation.

Finally, we find it disturbing that a rather small reduction in the temporal representation may create misleading results for regional capacity mixes. Further research is therefore needed to analyze the extent to which other simplifications (for instance cost structure, weather data, etc.) affect results in CEMs. The modeler should take additional care when analyzing regional results to make sure they are not model artefacts resulting from simplifications in the models.