1 Introduction

Uncertainties play a key role in projecting future developments of the energy system. At least two factors contribute to this: (1) the energy system is determined by complex interactions of a wide range of drivers and (2) there is a lack of empirical data. Factors that influence future energy demand and supply include economic activity, developments in economic structure, lifestyle changes and technology development. Our understanding of the interaction of these factors is still limited (and they may range over a wide range of possible outcomes). On top of this, the lack of empirical data complicates the development and calibration of models, especially for developing regions.

Despite limitations in both theory and data availability, a wide range of models has been developed to explore trends at global, regional and national scales. These models are partly developed from different scientific paradigms, which may lead to different interpretations of the past and different expectations of the future [32, 48]. A classic example is the difference between models from a macro-economic tradition (top–down) and those from a technological tradition (bottom–up). These two traditions tend to interpret the present situation differently with respect to energy efficiency (‘improvement of energy efficiency leads to higher costs’ vis-à-vis ‘major opportunities for improvement without substantial costs’) and as a result also expect different mitigation costs in the future [23]. Even within one model, however, often different options exist on how to interpret the current and past situation. For instance, macro-economic demand functions often include both income elasticity and price elasticity, which are hard to identify unambiguously in historic data. A different interpretation of the past may lead to different calibrations of the model and uncertainty in future projections. So far, different methods have been used to explore uncertainty in global energy models [12, 30, 53, 65], but relatively little attention has been given to the influence of model calibration on future projections.

The issue of multiple model calibration is closely related to the concept of equifinality, which focuses attention ‘on the fact that there are many acceptable representations that cannot easily be rejected and should be considered in assessing the uncertainty associated with predictions’ [7]. These ‘acceptable representations’ are called behavioural. The “acceptance criterion” can be defined strictly quantitative (e.g. above a threshold value of a likelihood measure) or more qualitative (e.g. reproduction of trends). At present, calibration of energy models is often done on the basis of the modeller’s expert knowledge to identify a single set of plausible parameter values. However, if multiple sets of parameter values are tenable and model projections are sensitive to the parameter values chosen, this practise is questionable [17].

In this context, we have developed a method to automatically calibrate models and obtain sets of parameter values that perform reasonably against historic data. These calibrated sets are obtained by varying the main model parameters within a limited range, choosing an initial estimate in this range and searching consecutively for a (local) optimum to minimise the error between observations and model results. Repeating this procedure many times, initialised at different locations in the parameter space, generates a series of (different) calibrated sets of parameter values. This method is related to both nonlinear regression methods like parameter estimation (PEST) [16] or UCODE [43] and (sequential) Monte-Carlo-based approaches like generalised likelihood uncertainty estimation (GLUE) [8] or SimLab [49].

We apply this method to the energy demand module of the global energy model The IMAGE Energy Regional Model (TIMER) 2.0, a system dynamics model that simulates developments in global energy supply and demand [14, 67]. The TIMER 2.0 model is the energy sub-model of the integrated model to assess the global environment, IMAGE 2.4, that describes the main aspects of global environmental change [10]. In recent years, this model has been used in several global scenario studies like the IPCC Special Report on Emission Scenarios (SRES) [25], the Millennium Ecosystem Assessment [33], UNEP Global Environmental Outlook [54] and the Organisation for Economic Co-operation and Development (OECD) Environmental Outlook [40].

Based on above considerations, the main research question that this paper focuses on is whether in the TIMER model equifinality in calibration can be observed, and if so, what it means for future projections. It should be noted that several uncertainty studies have been performed with the TIMER model [55, 61, 64]. These analyses accepted the model’s initial calibration and focused on the spread in model outcomes based on variation in central input values. Moreover, they (and for that matter the same applied to other global energy models) focused on the global level, neglecting interesting underlying trends in different regions. Recent analysis of TIMER found that uncertainty in energy demand trends is a major source of model uncertainty [65]. Therefore, we focus this analysis on the TIMER energy demand sub-model. Within energy demand modelling a further choice was made to focus on the transport sector, which is the sector with the fastest growth in energy use. With respect to regions, we focus on Western Europe and India, to represent both and industrialised and a developing region.

In this paper, first in Section 2 we discuss the role of uncertainty in energy modelling and introduce a methodology to capture uncertainty in model calibration. In the second part of the article, we elaborate on the application of the method: Section 3 describes the structure of the TIMER 2.0 energy demand model and selects parameters that are useful for model calibration. Section 4 presents the results of the analysis, Section 5 evaluates the presented methodology and Section 6 discusses and concludes. The underlying details of this paper, including mathematical descriptions, parameter ranges, more in-depth analysis of results and application to the USA, Brazil, China and Russia can be found in [63].

2 Uncertainty in Model Calibration

2.1 Uncertainty in Energy Models

Exploration of different futures on the basis of models is complicated by inherent uncertainties [19, 31, 4547, 5760]. Uncertainty and associated terms (such as error, risk and ignorance) are defined and interpreted differently by different authors [for reviews see 28, 46, 56, 68]. These different definitions partly reflect the underlying traditions and their associated scientific philosophical way of thinking. In general, uncertainty may be identified of input parameters, model structure or even different theories at a more aggregated level. Part of these uncertainties are related to natural randomness (ontic). Other uncertainties results from limited knowledge (epidemistic). One phase of model development where uncertainties become apparent is during model calibration. Model calibration and validation are of critical importance. As Oreskes et al. [42] highlight, “In areas where public policy and public safety are at stake, the burden is on the modeller to demonstrate the degree of correspondence between the model and the material world it seeks to represent and to delineate the limits of that correspondence.” However, given existing uncertainties in most cases historic trends and data can be interpreted in different ways. This is also emphasised by Beck [4] when he noted that almost all models suffer from a lack of identifiability, i.e. many combinations of values for the model’s parameters may permit the model to fit the observed data more or less equally well.

The notion of ambiguity in model identification and calibration can be valued differently [3, 18]. In statistical modelling traditions, ambiguity in model calibration is typically interpreted as over-parameterisation of the model. Following Occam’s razor, this could be solved with model reduction [11, 26, 71, 72] or developing multiple specialised models [5] to strike a balance between model complexity and data availability. In rule-based (system-dynamic) and engineering modelsFootnote 1 the model structure is based on (intuitive) causal relations and rules (either in physical or in monetary terms) that are calibrated to historic data [15, 41]. Such causal relations may be postulated, even in the absence of sufficient data for calibration. Beven [7] aims to extend traditional schemes with a more realistic account of uncertainty and rejects the idea that a single optimal model exists for any given case. Instead, models may not be unique in their accuracy of both reproduction of observations and prediction (i.e. unidentifiable or equifinal) and subject to only a conditional confirmation, due to e.g. errors in model structure, calibration of parameters and period of data used for evaluation.

In energy modelling literature, the most analysed sources of uncertainty are parameters and model structure in direct relation with future projections of model drivers. As a typical example, Tschang and Dowlatabadi [53] deal with input parameter uncertainty when performing an uncertainty analysis of the Edmonds–Reilly global energy model. They use Bayesian updating techniques to filter out model simulations that do not conform to outputs on energy consumption and carbon emissions and determine updated prior distributions for several core parameters. Van Vuuren et al. [65] use a slightly more complicated method, in which sampling of input parameters is made conditional upon different consistent descriptions of the future. With respect to model structure, an example is provided by Da Costa [12] who compares the results of two different energy models for Brazil. He concludes that although the aggregate results of these models are comparable, considerable differences exist when the results are broken down.

This study focuses on uncertainty that originates from the calibration of parameter values. We explore whether acceptable sets of parameter values in model calibration (so-called behavioural sets) can be identified for the TIMER energy demand model and what these imply for the model’s projection, inspired by Beven’s work on equifinality.

It should be noted that the mismatch between model prediction and observation can stem from many different sources [7], including those related to measurement, random error, but also the representation of reality by the model as a results of both parameter error and model structure. To keep our analysis manageable, here we assume that the parameter error is the dominant error component—and focus on the question whether our calibration procedure can indeed identify multiple, equally valid, calibrations of the energy demand model. Techniques exist to overcome this simplification and better deconstruct the mismatch between observation and prediction into the six constituting error terms of Beven [7] but this is beyond the scope of the present paper.

2.2 Methodology to Identify Calibrated Sets of Parameter Values

We developed an automated parameter estimation procedure in order to explore the impact of different sets of parameter values on model outcomes. The aim of the developed parameter estimation methodology is twofold. First, it is an automated model calibration procedure that minimises the error between model results and observations, generating a set of calibrated parameter values. In this sense it is related to nonlinear regression methods like PEST [16] or UCODE [43]. Second, by repeatedly applying the method it can be used to perform an uncertainty analysis on model calibration. This generates a series of calibrated sets of parameter values. This aspect is more related to (sequential) Monte-Carlo-based methods like GLUE [8] or SimLab [49]. The procedure closely follows the manual model calibration process that is normally applied to the TIMER model. This method involves several steps:

  1. A.

    Determining useful parameters for model calibration and their associated ranges

  2. B.

    Performing a series of model calibrations and identify sets of input parameters that perform well against historic data

  3. C.

    Analysing the sets of calibrated parameter values

  4. D.

    Analysing the impacts of calibration uncertainty on future projections

2.2.1 Determining Useful Parameters for Model Calibration and Their Associated Ranges

The first step of the method involves analysis of the model, to select useful parameters for the model calibration process. We also identify ranges for the calibration parameters, based on analysis of the model formulation, the values used in former calibrations, literature and expert judgement. This step is described in detail in Section 3. These ranges are used as boundaries in the parameter estimation process.

2.2.2 Performing a Series of Model Calibrations and Identify Sets of Input Parameters that Perform Well Against Historic Data

Criteria for Calibration Fit

Several measures exist to evaluate the deviation between model results (predictions, P) and observed data (O), of which an overview can be found in Janssen and Heuberger [27]. We choose to use the normalised root mean square error (NRMSE), comparing individual time series of observations and predictions, and defined as:

$$ {\text{NRMSE}} = \sqrt {\frac{{\sum\limits_{t = 1}^T {\left( {\frac{{P_t - O_t }}{{O_t }}} \right)^2 } }}{T}} $$
(1)

In this, P t and O t indicate the predicted and observed value in year t and T is the number of years in the time series. This measure has values between zero (perfect fit) and infinite (random). Multiplied with 100, the NRMSE can be seen as the time averaged percentage deviation between the time series of model results and the time series of observations. A certain threshold level for the NRMSE can be defined, below which models are called behavioural with the data (e.g. a NRMSE < 10%), but in [63] we show that it is hardly possible to find criteria for such general numeric threshold.

We use the NRMSE for several reasons. First, it expresses model error at the individual data level. The alternative, expressing model error on the average level, only provides a rough impression of the model-data-discrepancy and averages out the dynamic features [27], whereas with calibration one wants to simulate both trends and patters in the data. Second, the NRMSE can easily be normalised in each year to observed energy use to prevent that years with higher energy demand dominate the estimated overall error.

Series of Model Calibrations

As starting point for the parameter estimations, we use the initial dataset (SI) for P parameters and N parameter estimation attempts: SI P,N (i.e. for the parameter and ranges identified in the previous step). We use a combination of design of experiments (central composite design [39], to explore the extremes of the parameter space) accomplished with a series of random numbers. In the model calibrations, the input parameters are varied in order to minimise the NRMSE, starting at the locations in the parameter space defined in the dataset SI P,N . We look for optimal parameter estimations by using a MATLAB built-in functionality for constrained nonlinear optimisation, using sequential quadratic programming [37]. This algorithm approaches the model as a black-box optimisation function and varies the parameter values until the derivative of the objective function (i.e. the NRMSE) reaches values between zero and a pre-defined threshold level. This results in a dataset with calibrated parameter values that have a good (or best obtainable) fit with observations of energy use for the period 1970–2003: SC P,N . This can be best imagined as the collection of local optima in the objective function landscape spanned up by the explored parameter space.

2.2.3 Analysis of Calibrated Parameter Values

We analyse the series of calibrated sets of parameter values in SC P,N in several ways. First, the distribution of the calibrated parameter values over their range is analysed. Second, we plot the calibrated parameter values against the NRMSE (see Fig. 3, upper graphs). Relations between parameters and the impact of parameters on the NRMSE can be numerically expressed by the (linear) Pearson correlation coefficient between parameters. We use this as the simplest indicator to express a relation between two parameters, although it does not capture non-linearity or the existence of multimodal distributions.

Based on this, behavioural sets of parameter values can be selected. The most straightforward method is based on the NRMSE value, for instance, one can decide to call sets of parameter values with NRMSE < 10% behavioural. An alternative, but less reproducible criterion is based on visual inspection of the parameter values and the observed and simulated time series of energy demand. In our analysis, we decided not to remove any sets of parameter values based on non-behavioural outcomes. However, we use the NRMSE (hence, behavioural/non-behavioural) to weight future projections that are derived from the different sets of parameter values.

2.2.4 Analysing the Impacts of Calibration Uncertainty on Future Projections

To analyse the impact of different parameter values on future projections of the model, we use the series of calibrated sets of parameter values in SC P,N to run the model forward for the period 2003–2030 using a similar scenario on the model drivers (see Section 4.2). This leads to a range of projected future energy use, based on the different sets of parameter values. We analyse this in a frequency diagram of energy use in 2030 and weigh the frequencies in the diagram relative to the NRMSE of the parameter set that obtained the best fit to historic data in SC P,N (implicitly assuming that sets of parameter values with a better fit to historic data lead to more plausible future projections). The weight (W) that the N’th calibrated parameter set gets in the prediction ensemble is defined as the normalisation of the relative weight (R) of the parameter set to the best performing parameter setFootnote 2:

$$ W_N = \frac{{R_N }}{{\sum\limits_N {R_N } }}\;{\text{where}}\;R_N = \frac{{{\text{NRMSE}}_{\text{best}} }}{{{\text{NRMSE}}_N }} $$
(2)

In the remainder of this article, we apply this method to the transport sector energy use model of TIMER 2.0.

3 The TIMER Energy Demand Model

The global energy model TIMER includes both demand and supply of energy [14, 64, 67]. Because of the many feedbacks, interactions and sub-modules, the TIMER model as a whole is too complex to analyse the uncertainty from calibration. Therefore, we here confine the analysis to the sub-model that simulates the demand for energy on the basis of economic activity and autonomous and price-induced efficiency improvements.

Energy use is first modelled as the annual demand for useful energyFootnote 3 (UE, in GJ/year, see Fig. 1), which is converted to secondary energy use, using specific efficiencies for different fuels. Useful energy demand is modelled as function of four dynamic factors: structural change, autonomous energy efficiency improvement (AEEI), price-induced energy efficiency improvement (PIEEI) and price-based fuel substitution. Thus:

$$ {\text{UE}}_{R,S,F(t)} = {\text{POP}}_{R (t)} \cdot X_{R,S (t)} \cdot Y_{R,S,F (t)} \cdot {\text{AEEI}}_{R,S,F (t)} \cdot {\text{PIEEI}}_{R,S,F (t)}\;\left( {\text{GJ/yr}} \right) $$
(3)

in which POP is the population (in persons), X is the per capita economic activity of a sector (in purchasing power parity (PPP), constant 1995 international $/capita/year), useful energy intensity (Y, in GJ/$/capita/year) captures intra-sectoral structural change and the AEEI and PIEEI (dimensionless) multipliers represent autonomous and price-induced efficiency improvements. The indices R, S and F respectively indicate region, sector and energy form (heat or electricity).

Fig. 1
figure 1

Overview of the TIMER energy demand model and identification of model inputs, output and parameters used to determine calibration uncertainty

Statistical time series are available for several input variables (economic activity, fuel prices and market shares) and for the output variable: final energy use. Between these observed variables, the model tells a story of useful energy intensity (structural change) and autonomous and price-induced efficiency improvements, aggregates that can hardly be measured in the real world. The multiplicative structure of this model leaves room for different behavioural sets of parameter values: for different implementations of UEI, AEEI and PIEEI, a similar result can be obtained for the observed time series of final energy use.

This generic model is used for the five economic sectors in TIMER. In this analysis we look specifically into the transport sector implementation of the model. We determine calibration uncertainty against the total demand for transport energy as provided by International Energy Agency (IEA) data. Data for energy prices are derived from the IEA and data on economic activity are obtained from the World Bank WDI [70]. We equate energy demand and energy use, as the statistical data are assumed to have satisfied demand in a state of economic equilibrium on an annual basis; hence, we do not consider the concept of latent (or unfulfilled) demand for energy (which is relevant for low-income regions). Compared to specialised models for transport energy use [e.g. 1, 50, 69], the TIMER model is aggregated and stylized. Especially because it does not take into account the intermediate variables of car ownership or person and freight kilometres or generic concepts like time and money budgets. It also includes energy use for both passengers and freight in a single model.

3.1 Useful Energy Intensity Curve

From energy analysis [see for instance 20, 38, 44] it is known that:

  1. 1.

    there is a tendency for total energy use to increase with population and economic activity

  2. 2.

    in many countries, energy intensity tends first to rise then decline; this takes place at the level of the whole economy but also at sector level. This pattern is often referred to as the Environmental Kuznets Curve [for discussions see 52, 62]. It is usually explained in structural change processes [For analyses see e.g. 21, 29, 51, 66]. The income level at which such a maximum in intensity is reached tends to decrease over time [6, 22]

These stylized facts are represented in the model equation for useful energy intensity (Y (t) ) in the form of a (asymmetric) bell-shaped function of the sector-specific per capita economic activity. For each region (R), sector (S) and energy form (F) at time t, this can be expressed asFootnote 4:

$$ Y_{R,S,F(t)} = Y_0 + \frac{1}{{\beta \cdot X_{{(t)}} + \gamma \cdot X_{{(t)}}^{\delta } }} $$
(4)

with X (t) the sectoral economic activity per capita and β, γ and δ parameters (of which δ is negative to maintain a bell-shaped form, see Fig. 2). All parameters in this equation are defined per region, sector and energy form.

Fig. 2
figure 2

UEI curve (left) and useful energy use per capita (right) for hypothetical parameter values

The flexible formulation of this curve implies also a high sensitivity to parameter values. From an energy point-of-view, some reasonable constraints can be made to limit the potential parameter space to a relevant subspace and to shape the curve on the basis of understandable quantities:

  • The activity level at which the maximum occurs, X max, can be estimated from regional energy use data.

  • The second term of the curve may be related to the saturation level of useful energy per capita per year at high income levels (U, see Fig. 2, right graph). This saturation level can be based on sector and region specific features such as climate or population density.

  • Y0 can be interpreted as the ultimately lowest energy intensity of sectoral activity (in $/GJ) in the both limits \( X \to \infty \) and \( X \to 0 \).

Values and ranges for the parameters β, γ and δ can be derived from these constraints, in combination with the assumption that the curve should be forced through one observed reference point which can be any year in the period 1971–2003Footnote 5. Each implementation of the curve (as function of X max, U and Y 0) can be characterised by its unique maximum energy intensity, i.e. the top of the curve (Y max, see Fig. 2), determined by:

$$ Y_{{{ \max }}} = Y_0 + \frac{U}{{X_{{{ \max }}} }} \cdot \frac{\delta }{{\left( {\delta - 1} \right)}} $$
(5)

We established suitable prior ranges for the variables X max, U and Y 0 and translated these into values for the curve parameters β, γ and δ [63].

3.2 Autonomous Energy Efficiency Improvement

The continuous decline of energy intensity due to technology change is represented in the TIMER model by the autonomous energy efficiency improvement multiplier. Marginal AEEI is defined as fraction of economic activity growth [34]:

$$ AEEImarg _{{R,S{\left( t \right)}}} = F_{s} \cdot {\left( {\frac{{{\text{GDP}}_{{pc{\left( t \right)}}} R}}{{{\text{GDP}}_{{pc{\left( {t - 1} \right)}}} R}} - 1} \right)} \cdot 100{\left({\% \mathord{\left/ {\vphantom {\% {{\text{year}}}}} \right. \kern-\nulldelimiterspace} {{\text{year}}}} \right)} $$
(6)

with F S a sectoral specific fraction of economic activity growth. The vintage structure modelling for energy using capital in TIMER determines that the current AEEI is the weighted average of the marginal AEEI over the capital life time [14]. This means that rapid economic growth leads to a faster decline in AEEI, due to both increased decline in the marginal AEEI and a larger share of the capital stock that is relatively new [64]. The parameter that can be used to calibrate the AEEI is the fraction of GDP growth (F S ) Footnote 6 (see [63] for details).

3.3 Price-Induced Energy Efficiency Improvement

The PIEEI reflects that with increasing energy prices end-users take measures to use energy more efficiently. The description of PIEEI in TIMER is based on an assumed energy conservation supply–cost curve. By comparing the gains of efficiency improvement (annual saved energy times payback time and energy prices) to the cost of investments, an optimum can be found. There are three main factors that determine the level of energy efficiency: first, the form of the supply–cost curve; second, the value of the payback time and third, learning-by-doing of energy efficiency technology. In the TIMER model, the energy conservation supply–cost curve can be compared to bottom–up technology data [14] but is modelled as an aggregated stylized function. The optimal level of energy efficiency (E, as fraction of total energy use) is defined as the point at which marginal energy conservation measures still yield net revenue:

$$ E_{R,S,F(t)} = M_{R,S,F} - \frac{1}{{\sqrt {M_{R,S,F}^{- 2} + \frac{{C_{R,S,F(t)} \cdot T_{R,S,F(t)} }}{{S_{R,S,F} \cdot I_{R,S,F(t)} }}} }} $$
(7)

in which M is the maximum potential price-induced efficiency improvement (as fraction of total frozen energy use), C the sectoral average costs of useful energy (in $/GJ) and T the (apparent or desired) payback time (in years). I is the dimensionless factor with which the cost curve declines as a result of learning-by-doing. The scaling parameter S is used to scale the curve to the sector-specific costs of useful energy. The PIEEI on marginal investments, which is used in Eq. 3, is a dimensionless multiplier defined as: 1−E R,S,F . Vintage modelling of energy demand capital delays the impact of the PIEEI, as the current PIEEI is the weighted average of the marginal PIEEI over the capital life time.

In the parameter estimation procedure we vary values of payback time (T) and the learning parameter (I)Footnote 7 using historic energy pricesFootnote 8 (see [63] for details).

4 Application to Transport Energy Use

We tested our method to identify multiple behavioural sets of parameter values to the transport sector energy demand sub-model of TIMER. We performed 100 parameter estimation attempts per region (so N = 100 in SI P,N and SC P,N ). Section 4.1 discusses the results of calibration to historic data (i.e. step B and C, explained in Section 2.2). Section 4.2 explores the impact of the calibrated sets of parameter values on future projections (step D of the procedure).

4.1 Calibration to Historic Data

Final energy use of the transport sector in Europe shows an increasing trend, with temporary slower growth after 1980 due to oil price increases. Generally, the model simulates transport energy use in Europe quite well with a best NRMSE of 2.8% (Fig. 3). The fluctuations during the 1980s are also well-captured (Fig. 3, lower graph). The calibrated parameter values vary over a wide range and only U, AEEI and PIEEI have clear relations with the NRMSE, although X max is generally high and Y 0 is low (Fig. 3, upper graphs). About 5% of the sets of parameter values have an NRMSE higher than 10% and can be identified as outliers on the basis of the parameter values. Generally, the parameter values follow two model stories: the best-fitting sets of parameter values have high values for AEEI (>1%/year) and no PIEEI; a second group has low values for AEEI and high PIEEI. The high correlation coefficient between AEEI/PIEEI and NRMSE (Table 1) also shows these different implementations of the parameter values.

Fig. 3
figure 3

Upper graphs plot of 100 calibrated sets of parameter values for transport sector energy use in Western Europe. Each dot represents a calibrated parameter value for the period 1971–2003. Lower graphs historic and projected transport energy use for Western Europe up to 2030 (left graph) and histogram (right graph) of energy use in 2030 using the NRMSE as weighting factor. Projections based on OECD-EO scenario inputs and calibrated sets of parameter values

Table 1 Linear correlation coefficient of calibrated parameter values

In the 1971–2003 period energy use in the transport sector of India has been growing exponentially. This trend is simulated best with a constant useful energy intensity (in the 1971–2003 GDP/capita range), AEEI of about 1%/year and no PIEEI (Fig. 4). In relation to the NRMSE, a better fit appears with low values for X max, high U and high AEEI. There are no systematic relations between parameters (Table 1), except between maximum energy intensity (Y max) and NRMSE (i.e. a lower Y max leads to a better fit).

Fig. 4
figure 4

Upper graphs plot of 100 calibrated sets of parameter values for transport sector energy use in India. Each dot represents a calibrated parameter value for the period 1971–2003. Lower graphs historic and projected transport energy use for India up to 2030 (left graph) and histogram (right graph) of energy use in 2030 using the NRMSE as weighting factor. Projections based on OECD-EO scenario inputs and calibrated sets of parameter values

Several issues play a role in estimating the model parameters for developing regions. With respect to the UEI curve, these regions have rather narrow absolute GDP per capita ranges between 1971 and 2003 and they are forced to be below the top of the UEI curve (the lower bound of X max is 5,000 $/capita/year). Historically, useful energy intensity might have been constant, but it can be questioned whether such implementation of the model is representative outside the range of historically observed economic activity. Another source for the model error in India (and other developing regions) might be that the TIMER model does not capture some important concepts that are relevant for developing countries (e.g. urban/rural differences and unequal income distribution [62]) and ignores the role of specific technologies (e.g. modal split in transport).

4.2 Impact on Future Projections

To determine the influence of the different sets of parameter values on future projections of the model we calculate the projected energy demand in 2030, using scenario inputs of the OECD environmental outlook scenario [OECD-EO, described in detail in 2, 40].These scenario inputs include projections for GDP, sectoral value added and population. The OECD-EO is a baseline scenario without new policies on economy and environment, in which energy use is based on moderate projections of population and economy. In this analysis we use the same energy prices for all forward calculations; these prices correspond with the default implementation of this scenarioFootnote 9.

The TIMER model was used in its original setting within the OECD-EO study to project development of the future energy system, including energy transport demand. These projections can be very different from the current as (1) TIMER modellers have focused in model calibration not only on the performance of a single region but aimed to have similar parameter settings for different regions and (2) have calibrated to the model projections also against the IEA World Energy Outlook.

The projections of future transport sector energy use in Western Europe in 2030, based on the calibrated sets of parameter values, show a slowly increasing energy use toward 15–25 EJ/year. In 2030, these projections vary over a wide range (Fig. 3); expressed as percentage around the ‘best fit’ in 2030, this range amounts 79% (Table 2). The outliers (with NRMSE values above 8%) are generally on the lower bound of future projections (below 20 EJ/year in 2030). The most behavioural sets of parameter values and the OECD-EO scenario are on the lower bound of this range. However, most sets of parameter values (weighted to the NRMSE) project an energy use of 19–23 EJ/year in 2030, higher than the best fitted parameter set and the OECD-EO scenario.

Table 2 Correlation coefficient between calibrated parameter values and projected energy use in 2030 for the transport sector

Forward calculations for India indicate an increasing transport sector energy use from 1.5 EJ/year in 2003 to 2.5–3 EJ/year in 2030. Relative to the ‘best fit’, the range for India is narrow: only 44%. The OECD-EO scenario is clearly above the range of projections, leading to 4 EJ/year in 2030. The outliers for India (with NRMSE values above 5%) generally project higher future energy use (above 2.8 EJ/year in 2030). Projected energy use correlates strongest with AEEI and Y max: higher AEEI (and thus, better fit) leads to lower projected energy use (Table 2).

Another issue of interest is which parameters mainly influence the projected energy use. This is explored in Table 2, showing the correlation between the calibrated parameter values and projected energy use in 2030. AEEI and UEI seem the most influential model parameters, but the minor role of PIEEI might be related to the slow increase of energy prices in the OECD-EO scenario.

5 Method Evaluation

Several remarks can be made about the presented method to identify variation in model calibration parameters. Because the method applies an optimisation algorithm to minimise the error between model results and data, it does not guarantee the identification of the total fit landscape. Especially, if the fit landscape is flat this algorithm identifies the best-fitting (local) optimum, possibly ignoring other well-fitting sets of parameter values that have a slightly higher NRMSE. This indicates that the uncertainty from equifinality on forward projections might be larger than estimated in this study. A detailed Monte Carlo sampling analysis would guarantee that the whole fit-landscape is identified. However, we found in early stages of this analysis that equifinality sometimes takes place within very small ranges of the parameter values. Hence, the sampling has to be very detailed in order not to overlook the relevant parameter values, driving up calculation time. We used optimisation to efficiently scan the parameter space, and partly overcome this issue by initialising the parameter estimation process from many different locations in the parameter space (including ‘design of experiments’ to initialise at the corners of the parameter space). However, advanced adaptive sampling methods (see for instance Hendrix and Klepper [24]) might be better able to identify the full range of equifinality.

For this model we chose 100 different initialisations, balancing between calculation time and size of the database. Analysis of the results shows that for this model the shape of the distribution of the parameters and the NRMSE did not change significantly after 60 to 80 parameter estimation attempts. We expect this to be specific for each model. If this automated calibration procedure would be applied to another model, convergence of the NRMSE and the shape of the parameter distributions should be monitored to see whether enough initialisations have been chosen. It is clear that the method also identifies outliers, cases in which the optimisation algorithm is terminated at relatively high NRMSE values. In the analysis that we performed, about 5–10% of the calibrated sets of parameter values could be identified as outliers. We conclude that the estimation technique performs well and most of the identified variation can be attributed to the model at hand.

In the error model that we use, we oversimplified by attributing the difference between modelled and observed values completely to the parameter error. One could extent the method towards more focus on measurement error in the observation (both economic and energy use data are far from certain), for instance by adding white noise to the calibration variable, or input and boundary condition error. In the specific case of TIMER, an error distribution on the reference energy intensity for the UEI curve might deal with data error and allow a broader range of sets of parameter values to be behavioural with the data. Another issue in the TIMER case is that parameter error and model structure error can hardly be separated, because the parameters related to the UEI curve, can change the functional form of the model dramatically (e.g. from bell-shaped to linear).

The development of the described method is inspired by the concept of equifinality, developed by Beven based on his experiences with the GLUE methodology. The GLUE methodology has recently been subject of a scientific debate on its consistency with Bayesian statistics. A major criticism on GLUE was its application of ‘less formal likelihood’ measures; this may imply that it looses the learning properties of the Bayesian approach, leading to ‘flat’ parameter posterior densities and thus equifinality is build in the methodology [35, 36]. In response, it has been argued that if strong assumptions about the error model cannot be justified, GLUE provides a reasonable alternative [9]. The method applied here differs from both Bayesian updating and GLUE, because it does not apply sequential Monte Carlo analysis. Moreover, it also has elements of nonlinear regression methods like PEST and UCODE, in that its purpose is to identify ‘peaks’ in the fit landscape. Therefore, we conclude that this discussion does not apply to this method.

6 Discussion, Conclusion and Implications

A method was developed to identify sets of parameter values that perform reasonably against historic data. Energy use modelling knows many scientific paradigms and traditions, which lead to different interpretations of past and present and to different expectations of the future. Even within one model, several options may exist on how to interpret the past and current situation. We developed a method to identify the range of sets of parameter values that perform reasonably against historic data and analyse the impact of these different calibrations on future projections. The essence of this method is that by varying several essential parameter values, we search to minimise the error between model results and observations. By repeating this parameter estimation procedure, starting from different locations in the parameter space, we were able to identify a range of local optima in the error landscape within the parameter space. These co-existing different interpretations (i.e. values of essential parameters) that explain historic energy use comparably well are incorporated in the prediction ensemble.

In the energy demand modelling of the TIMER model, different parameters sets can be observed that all lead to reasonable calibration (equifinality). From the application of this method to the TIMER 2.0 energy demand model for the transport sector, we found that its model formulation, in combination with the aggregated character of energy statistics available for calibration, leaves room for multiple behavioural sets of parameter values. In the given model formulation, the different options for calibrated parameter values are related to the balance between useful energy intensity and energy efficiency improvement. Generally, high useful energy intensity combined with major efficiency improvements leads to similar results as low energy intensity and stagnant efficiency improvement.

Different model calibrations lead to different future projections. The range in outcomes is about 44–79% around the best-fit option. With respect to future projections, we found that different (behavioural) sets of parameter values can lead to a wide range of future projections. AEEI and useful energy intensity are the most decisive model aspects with respect to future energy levels.

Equifinality of the TIMER model can partly be improved by further model development. What does this analysis imply for the application and development of the TIMER model? Given the aggregate nature of both model and data some parameter ambiguity is inevitable and does not a priori disqualify the model. For the existing model, a workable situation can be created by using the ‘best-fit’ calibrated parameter values and communicating the calibration uncertainty range with the model results. More fundamentally, two options exist for model improvement. First, the data-based solution would be model reduction. However, because the model only involves three well-established concepts (energy intensity and autonomous and price-induced efficiency improvement) model reduction implies econometric curve-fitting. A second option is to convert the model to a more bottom−up nature and use the increasingly available data and insights from the underlying physical activity (in this specific case: data on person or freight kilometres, or ownership of cars, trucks, planes etc.; and the concepts of time and money budgets). Such development would lead to two major improvements: first, it provides an extra model layer (of physical activity) that can be calibrated to data and second, such model enhances insight in the actual activity that is simulated and projected.