Introduction

This paper evaluates the effects of mandatory building energy performance certificates (EPCs) on home heat loss in existing residential buildings, by focusing on the influence of building fabric while excluding occupants’ behaviour. In order to improve building energy efficiency in member states, the European Union (EU) has adopted Energy Performance of Buildings Directive which provides guidance and information for buyers and tenants through energy performance certificates (European Commission, 2002; European Commission, 2021). Building energy performance ratings, such as those under the EU’s ‘Energy Performance of Building Directive’ and the ‘Energy Star Certified Homes’ in the USA have been a central element of energy policies to promote investment in energy efficiency and to meet targets of greenhouse gas emissions reduction.Footnote 1

EPCs are broadly used as a policy metric within the residential sector for achieving ambitious climate targets (Economidou et al., 2020). As part of its building decarbonisation strategy, the Irish government, for instance, has set a goal to upgrade half a million existing homes (\(\approx 25\%\)) to a B2 rating by 2030, through an €8 billion retrofit scheme (CAP, 2021). EPCs can also be used to identify areas at a high risk of fuel poverty (Camboni et al., 2021; Few et al., 2023) and determine eligibility for home retrofit grants (Semple and Jenkins, 2020; Van Hove et al., 2023). At the household level, EPCs serve as benchmarks for property renovation and decisions on buying, renting, and selling properties (Li et al., 2019; Fregonara et al., 2017).

However, EPCs are based on projections from engineering models, which use physical principles to calculate thermal dynamics and energy behaviour on a building (Foucquier et al., 2013; Zhao and Magoulès, 2012; Pérez-Lombard et al., 2009). In generating the EPCs, these simulation models assume standardised values for the number of occupants, energy use schedules, and other parameters (Wenninger and Wiethe, 2021; Amasyali and El-Gohary, 2018). By their nature, EPCs do not, therefore, capture the full nuance of actual energy performance. There are also significant variations in the methods and input data used to assess the energy performance of residential buildings across European countries (Semple and Jenkins, 2020). Li et al. (2019) discuss the challenges and issues surrounding EPCs and their limitations in capturing the full nuance of energy performance, leading to a discrepancy between expected and observed energy performance (Coyne and Denny, 2021; Cozza et al., 2020; Zou et al., 2018; Van den Brom et al., 2018; Gram-Hanssen and Georg, 2018; Majcen et al., 2013; De Wilde, 2014). This difference is commonly known as the Energy Performance Gap. To improve energy consumption prediction accuracy of EPCs and overcome engineering model shortcomings, recent research has employed data-driven methods (e.g. Amasyali and El-Gohary, 2018, Bourdeau et al., 2019, Mutani and Todeschi, 2021, Pasichnyi et al., 2019, Wenninger and Wiethe, 2021).

Occupant behaviour is widely discussed in the literature as a main driving factor of the gap between expected and actual energy use (e.g., Gillingham et al. 2020; Fowlie et al. 2018; Aydin et al. 2017; Sunikka-Blank and Galvin 2012; Sorrell et al. 2009). The behaviour of occupants regarding the temperature at which they heat their dwelling, duration, and timing of heating may differ from the standardised assumptions made in projection models (Van Hove et al., 2023). Many of the empirical studies comparing ex-ante projected energy performance with actual energy consumption are unable to fully disentangle building fabric performance from the intensity of occupants’ behavioural effects (Coyne and Denny, 2021; Cozza et al., 2020; Zou et al., 2018; Van den Brom et al., 2018; Gram-Hanssen and Georg, 2018; Majcen et al., 2013; De Wilde, 2014). For instance, energy consumption data will reflect occupants’ preferences for ambient internal temperature, or hot water demand, which are distinct from building fabric performance. Understanding the potential magnitude of such attenuation is important to validate the projected estimates of climate impact measures.

Our paper provides this contribution to the energy performance gap literature by isolating building fabric performance from occupant behavioural effects in order to examine the relationship between building fabric performance and energy performance certificate ratings. This is relevant as some recent ex-post evaluations have cautioned policymakers relying on theoretical energy performance certificates’ energy use as a mechanism to deliver real energy savings or cast doubt on the projected benefits of an energy efficiency investment (e.g. Coyne and Denny, 2021, Davis et al., 2020, Fowlie et al., 2018, Levinson, 2016, Van Hove et al., 2023).

This study conducts an ex-post evaluation of the effect of EPCs on home heat loss in existing residential buildings. As direct measurement of heat loss is not possible, we use indoor temperature as a proxy. We exploit a high-frequency panel dataset of indoor temperature and heating system operation over a 2-year period. This is in contrast to many studies that rely on (bi-)monthly metered energy consumption data. These data are matched with information on weather and property energy performance, as measured by EPCs. To isolate building fabric performance from occupant behavioural impacts, the analysis focuses on data from the early morning hours (mid-night–6:00 a.m.) when the heating system is confirmed as being turned off but and behavioural impacts, such as secondary heating, are less likely to arise. This allows us to clearly evaluate the impact of building fabric on temperature change within the dwelling and the insulative performance of the building fabric. Dwellings with well-insulated building fabrics are expected to have a lower drop in indoor temperature, a proxy for heat loss, controlling for external weather variables and abstracting from occupants’ behaviour. By examining the building fabric’s performance, we aim to evaluate the building material’s ability to retain heat and minimise heat loss, and therefore, building envelope is more relevant in the context of our study. This is particularly important for evaluating the energy efficiency of a building in cases where occupant behaviour deviates significantly from the assumed model simulation.

To estimate how good EPCs are in predicting actual performance, we model indoor temperature at a given hour as a function of external temperature, relative humidity and wind speed, indicators for building energy performance rating, and a previous hour’s indoor temperature. In our panel data modeling approach, we incorporate dynamic panel data model specifications that take into account variations in observed and unobserved time-invariant variables, such as the construction of the building fabrics, size and type of the dwelling, and efficiency and size of the heating unit, across the dwellings’ EPCs.

Our results show that EPCs significantly affect indoor temperature, a proxy for home heat loss. However, we do not find evidence to support the distinct gradient along the building energy performance scales as suggested by ex-ante estimates of home heat loss. In a related study, Few et al. (2023) matched homes based on the assumptions that EPC projections depend on factors, such as occupancy, thermostat set point, and whole home heating, and find that projected energy use exceeding actual energy use, with the gap widening with a decrease in the energy efficiency rating of homes. Our findings support the notion that in addition to occupants’ behaviour, other factors such as the use of default thermal transmittance in the absence of required data (Raushan et al., 2022; Ahern and Norton, 2020), measurement errors, and uncertainty in data quality (Crawley et al., 2019; Hardy and Glew, 2019; Mangold et al., 2015; Abela et al., 2016; Christensen et al., 2021) contribute to the discrepancy between projected and actual energy use.

The remainder of this paper is structured as follows: the ‘Institutional setting’ section presents the institutional setting. The ‘Data’ section outlines the data employed in this analysis. The ‘Methodology’ section provides the empirical strategy. The ‘Results’ section presents the results. The ‘Discussion’ section discusses the results. Finally, the ‘Conclusions’ section concludes the paper.

Institutional setting

The EU Energy Performance of Buildings Directive (EPBD) was first introduced in 2002 and recast in 2010 and 2021, with the aim of improving the energy performance of buildings in the member states (European Commission, 2002; European Commission, 2021). Among other measures, the EPBD requires EU Member States to provide information on a building’s energy performance through the use of Energy Performance Certificates. The rationale behind this requirement is straightforward: salient information on a dwelling’s energy performance can help guide individual decision-making toward the achievement of EPBD energy efficiency goals.

EPCs provide information to consumers on buildings they plan to purchase or rent. It includes an energy performance rating and recommendations for cost-effective improvements. Certificates must be included in all advertisements in commercial media when a building is offered for sale or rent. This must also be shown to prospective tenants or buyers when a building is being constructed, sold, or rented. Following the EU EPBD, Ireland adopted a mandatory energy performance certificate program. This program began on the first of January 2009, and the certificate is known as the Building Energy Rating (BER). By law, all new homes and homes for sale or rent are obliged to have a BER certificate for the purpose of providing information in advance to prospective tenants and purchasers of the home (SEAI, 2022a).

For each building, the BER certificate provides an estimation of energy use associated with lighting, ventilation, space heating, and water heating (SEAI, 2022b). It does not include electricity used for cooking, refrigeration, laundry, and entertainment. The energy performance of a building is expressed in terms of primary energy use per squared meter of floor area per year (kWh/m2/yr) on a 15-scale from A1 to G and the associated carbon dioxide (CO2) emissions in kgCO2/m2/yr. Figure 3 demonstrates how the 15 BER scales (A1-G) map to the BER in kWh/m2/yr. The rating scale is similar to the EU energy labeling for products subject to energy labeling regulation (EC, 2017). A1-rated properties, with an energy performance rating of up to 25 kWh/m2/year, are the most energy efficient. On the other end of the scale, G-rated properties, with an energy performance rating of more than 450 kWh/m2/year, are the least energy efficient (SEAI, 2022b).

The Irish BER certificate is administered by the Sustainable Energy Authority of Ireland (SEAI). The assessments are completed by SEAI-registered BER assessors, and the certificate is valid for up to 10 years. A BER certificate becomes invalid if there are modifications that could significantly affect energy performance (SEAI, 2022a). The BER assessment follows a standardised Dwelling Energy Assessment Procedure (DEAP) where property fabric and heating systems are inspected (DEAP, 2022). The DEAP accounts for factors such as property dimensions (size and geometry), construction material, thermal insulation of building fabric, ventilation (air infiltration due to openings and air tightness of the structure), characteristics of space and water heating systems, solar gains through glazed openings, property thermal storage (mass) capacity, fuel used for heating, and renewable and alternative energy generation technologies.

Data on BER-assessed properties is freely available on the SEAI website.Footnote 2 In addition to the BER rating in kWh/m2/year and corresponding scales, the database contains information on the size and type of property, year of construction, fuels used by a main space heating system, and the thermal transmittance of building fabrics and associated area of exposed and semi-exposed parts of the buildings. As of the beginning of February 2022, BER assessments have been completed on more than 960,000 properties. This corresponds to around 52% of the total number of occupied houses recorded in the 2022 Irish census (CSO, 2022).

Data

Data sources

We wish to analyse the relationship between BER scales (an Irish term for EPCs) and the insulative performance of a property, as revealed ex-post by a change in observed temperature. For this analysis, we use smart thermostat data which provides information on indoor temperature and heating system operation at a property level. This is matched to two datasets. First, each property is assigned a BER value using the online public search facility. Secondly, the concurrent outdoor temperature and weather conditions are matched using data from the Irish Meteorological Service. Each data source will be outlined in turn.

High-frequency data detailing indoor temperature and heating system operation for the main living spaceFootnote 3 of each dwelling are sourced from a Hub Controller, an automatic energy manager device with smart thermostat functionality, hereinafter referred to as the ‘smart thermostat’. Additional variables in this dataset include humidity of the living space, thermostat set-points, and whether an operational boiler (gas or oil) is in heating or boost mood. The smart thermostat unit reports this information at regular intervals averaging every 3 min. Our dataset comprises approximately 10,000 Irish homes for 24 months: October 01, 2019–September 30, 2021.Footnote 4

These data are matched with EPC data from each household’s Irish BER certificate, providing information on the household’s BER rating, both in terms of primary energy use per squared metre of floor area per year (kWh/m2/yr) and on a 15-scale from A1 to G. The BER certificate also contains information on dwelling floor area and estimated carbon dioxide (CO2) emissions in kgCO2/m2/yr, alongside information on the reason for obtaining the BER certificate.

The final data source employed in this analysis is local weather data from Ireland’s National Meteorological Service, Met Éireann.Footnote 5 The weather data consists of hourly air temperature (°C), relative humidity (%), wind speed (knots), sunshine duration (% per hour), and precipitation (mm). This weather data is then matched with the high-frequency thermostat data set at an hourly level, after constructing relevant variables from the smart thermostat high-frequency data at an hourly level. Properties in the smart thermostat dataset are located in the greater Dublin area. Consequently, we use data from the Dublin Airport weather station. These data were matched to the smart thermostat data of each property at an hourly level.

Data processing

We process the data by limiting the analysis to time periods where changes in temperature are plausibly influenced by the observed variables of ambient temperature and BER rating alone. To do so, we restrict the time period of analysis to the core winter heating months in Ireland: December to February. To abstract from occupant behaviour, we limit the data to the early morning hours of 00:00 to 05:59 a.m. inclusive, conditional on the heating system being turned off. This motivation underlying this restriction is as follows. It is plausible that there is no secondary energy input during this time, such as an open fire, and therefore, the rate of temperature change is a reflection of the insulative capacity of the building. If the heating system is turned on prior to 06:00 a.m., we exclude all subsequent data points from that analysis window. If a heating system is switched off for a lengthy period prior to 00:00, it may be difficult to capture how heat loss is associated with BER rating, as heat loss has already occurred. Consequently, we limit the analysis to properties that were heated in any of the 12 h prior to midnight.

How do we construct the hourly indoor temperature from the smart thermostat high-frequency data, with temperature readings and heating system status (on or off) approximately every 3-min intervals? As indicated above, the data for our analysis is limited to early morning hours from 00:00 to 05:59 a.m. inclusive, conditional on the heating system being switched off. From the raw data, we construct indoor temperature at a 1-h interval, starting at 00:00 a.m. up to 05:00 a.m. As the smart thermostat data may not have a recording of the indoor temperature exactly at 00:00 a.m., we extract the date-time stamp and associated indoor temperature of the first reading closest to 00:03 a.m. (± 3 min of 00:00 a.m., considering the thermostat frequency of 3 min). We then retrieve the subsequent temperature readings at 01:03 a.m. after 1 h, at 02:03 a.m. after 2 h, and so on, up to 05:03 a.m. after 5 h. Figure 4 illustrates how the indoor temperature data is extracted from the high-frequency smart thermostat data for an example starting at 00:03 and the subsequent 5 hourly data points (at 01:03; 02:03; 03:03; 04:03; 05:03). While analysis at a sub-hourly frequency is possible, the resource intensity for some of the statistical methods subsequently employed increases non-linearly. Hence, the analysis is undertaken at an hourly frequency without any loss of information pertinent to the analysis. Upon completion of this data processing, 703 properties remain in the dataset with a total of 356,318 hourly observations for analysis.

Descriptive statistics

This section provides insight into the distribution of the assembled data. First, though not intended to be representative of the national housing stock, the degree to which the matched dataset reflects the national distribution is explored. Table 1 compares the distribution of the 703 matched observations to the 967,608 residential properties with a BER assessment as of February 2022. Column 1 in Table 1 shows that about 90% of the sample have a ‘C1’ rated property or lower compared to approximately 80% of properties in the BER database (in column 3). The mean BER rating is about 242 kWh/m2/year for both the smart thermostat sample properties and the entire BER database. On average, the 703 properties are older and smaller in terms of property floor area and living room area compared to the national BER database. The average number of years since a BER assessment is similar at approximately 6 years. Since the 703 sample properties are from the greater Dublin area, we have also included the corresponding characteristics of BER assessed residential properties in Dublin in column 2. Properties in Dublin on average have relatively higher energy efficiency compared to our sample or the national housing stock. The HubController smart thermostat was installed in dwellings constructed prior to 2006. Consequently, we do not expect it to be a representative sample of the housing stock even in the Dublin Area.

Table 1 Descriptive statistics of sample properties
Table 2 Summary statistics of indoor temperature and weather variables for the 703 sample properties
Fig. 1
figure 1

Density of indoor temperature over 6 h when a heating system was off

Second, we explore the distribution of indoor temperature and outdoor weather conditions during the sample period. Table 2 provides summary statistics of indoor temperature (°C), outdoor temperature (°C), relative humidity (%), and wind speed (knots) at an hourly level for the 703 sample properties when the heating system was off during the interval 00:00–05:59 a.m. The mean indoor temperature in the 703 properties was 16.58°C in the 6-h period, 00:00–05:59 a.m, across the 3 months (December–February) over 2 years. Table 2 also reports corresponding values for properties by BER rating.Footnote 6 There is a high level of variability of indoor temperatures across properties with minimum and maximum values of 6°C and 34°C.Footnote 7

The distribution of temperatures is plotted in Fig. 1. The solid red line depicts the density of the indoor temperature readings at 00:03 across the 703 sample properties when a heating system was off. The solid black line shows the density of indoor temperature 5 h after the initial readings at 00:03. The mean indoor temperature declines from 17.62°C at 00:03 to a mean of 15.70°C after 5 h. This is an average of 2°C drop in indoor temperature over 5 h while a heating system was turned off throughout.

We further break down the average indoor temperature by hours across BER scales. Table 3 shows the average indoor temperature and its difference over hours across BER scales. In the first hour, the overall average drop in indoor temperature is about 0.54 \({}^\circ \text {C}\), and it continues to decline and get closer to zero (a steady state point), with small variations across the BER scales. The decline in temperature after a heating system is turned off is anticipated. The research question is to what extent the decline in temperature systematically varies by BER rating of properties. When comparing the mean temperature values in Table 3, there is a slightly greater decline in temperature among lower energy efficiency-rated properties. However, the decline in indoor temperature among buildings with lower BER ratings (less energy efficient) is less than what was anticipated. Possibly, the reason for this could be that these inefficient buildings are older and have undergone significant improvements to their building structure since they were initially constructed. Further discussion on this and a more systematic investigation approach are provided in the subsequent sections.

Table 3 Average indoor temperature by hours across BER scales for the 703 sample properties

Methodology

We seek to understand the extent with which building energy performance certificates capture the insulative capacity of the home. We use changes in indoor temperature as a proxy for unobserved home heat loss. In the absence of energy usage data or as an alternative to it, a similar temperature-based approach has been previously used to assess a building’s thermal performance (Albatayneh et al., 2019), to identify abnormal energy consumption faults (Lin and Claridge, 2015), and to investigate the effects of climate change on building energy performance (Congedo et al., 2021).

There are a number of confounding factors relating to energy use and behaviour that must be incorporated into the analysis. For example, households with large heating demand could self-select into A- or B-rated buildings. Energy-efficient households may also self-select into better-rated buildings or may have different preferences for indoor temperature. To address these and other effects relating to occupants’ behaviour, we limit our analysis to early morning hours when a heating system is off. The occupants’ behavioural impact is anticipated to be minimal during this time, as potential secondary heating sources (e.g. open fire, portable heaters) are less likely to be operational. In following this approach, we isolate the effects of building fabric from occupants’ behaviour on indoor temperature.

The underlying premise of our analytical approach is that temperature within the property in the early morning hours, isolated from occupant behaviour, is a function of three factors. The first and potentially greatest impact relates to temperature inertia. If a property had a high temperature reading 1 h ago, its current temperature reading is also likely to be relatively high. This autoregressive approach for modelling heating is widely used (Massana et al., 2017; Fazeli et al., 2016; Fang and Lahdelma, 2016; Powell et al., 2014). To fully exploit this inertia, we limit the period of analysis to the winter months (December–February) when a heating system is likely to be operational. The second factor relates to the insulative capacity of building fabric as measured by BER. This is our subject of interest. The third factor is ambient weather. Internal temperature is affected by external temperature, humidity, and wind conditions.

Modelling temperature as a function of lagged temperature values introduces a potential source of endogeneity as the lagged dependent variable is likely to be correlated with the error term (Anderson and Hsiao, 1981). A common solution is to adopt dynamic panel models using a generalised method of moments (GMM) estimator (e.g. Arellano and Bond, 1991, Arellano and Bover, 1995, Blundell and Bond, 1998, Roodman, 2009). This is a panel dataset with many time periods which presents some modelling challenges. A standard panel comprises N units of analysis (e.g. properties) across T time intervals (e.g. hours or years). There is an excess of 600 time intervals in the current dataset for some properties. Dynamic panel estimators (e.g. Arellano and Bond, 1991, Arellano and Bover, 1995) are designed for situations with small T, as the number of instruments increases quadratically in the number of time periods making estimation of large T models resource intensive and practically difficult. We follow three estimation strategies to address the challenge, which in practice return broadly similar results.

Standard panel data estimator

In the first strategy, we specify a panel data model where the time dimension is the hourly smart thermostat data frequency while the panel dimension is residential property. The model is estimated using a standard random-effects panel estimator. Such an approach does not address potential for biased coefficients associated with dynamic panels, termed ‘dynamic panel bias’ (Nickell, 1981), however with large T the bias is small.Footnote 8 The model is outlined in Eq. (1):

$$\begin{aligned} Temp_{ihdmy}= & {} \alpha + \beta Temp_{ih-1dmy} + \gamma Efficiency_{i}\nonumber \\{} & {} + \delta Weather_{hdmy} +\epsilon _{ihdmy} \end{aligned}$$
(1)

where \(Temp_{ihdmy}\) is indoor temperature (°C) in property i, at hour h, day d, month m, and year y. The indoor temperatures are those recorded by the smart thermostat at hourly intervals in the early morning hours. \(Efficiency_{i}\) is a measure of the energy efficiency rate of property i, of which we use a property’s BER assessment, specified as a categorical scales (A-G) or in \(kWh/m^{2}/year\). In addition to projected energy use, the BER variable considers the thermal transmittance (U-value in W/m2K) of various building components, including walls, roofs, floors, windows, and doors. Energy efficiency is improved when these components have lower U-values, indicating higher insulation levels. External weather variables (\(Weather_{hdmy}\)) include mean hourly outdoor temperature (°C), mean hourly outdoor relative humidity (%), and wind speed (knot). \(\epsilon _{ihdmy}\) is the stochastic disturbance term that accounts for measurement errors as well as unobserved variables that have the potential to influence the dependent variable. \(\alpha \), \(\beta \), \(\gamma \), and \(\delta \) are parameters to be estimated. \(\gamma \) is our main parameter of interest that captures the effect of building energy ratings on indoor temperature, a proxy for home heat loss.

Arellano-Bond type dynamic panel estimator

The second estimation strategy is to follow the common approach for estimating panel data with lagged dependent variables, which explicitly addresses dynamic panel bias (Arellano and Bond, 1991; Arellano and Bover, 1995; Blundell and Bond, 1998; Roodman, 2009). However, as noted earlier, such models are designed for situations with small T, and estimation with large T datasets is resource intensive. To counter the estimation issues associated with large T in such estimators, we restructure the data in the following manner. For each property, we use the mean temperature values at each hour for every month and year, as specified in Eq. (2).

$$\begin{aligned} \overline{Temp}_{imyh} = \frac{1}{|D|} \sum _{d=1}^D Temp_{ihdmy} \quad \forall h,m,y \end{aligned}$$
(2)

We then specify the time dimension solely as the hour index (h), representing the early morning hours (\(h \le 6\)). The panel dimension is represented by an index of property-month-year (\(imy > 3100\)). The goal is to estimate the effects of building energy performance, but the variable of interest is time-invariant. One strategy to address this problem is to conduct a panel analysis with a two-stage GMM procedure (Kripfganz and Schwarz, 2019). In the first stage, we use the GMM approach to estimate the time-variant variables. The model estimated is Eq. (3), where subscripts i, m, and y from Eq. (2) are subsumed as a single index representing the property-month-year, though we still use imy as a subscript for clarity. The estimated parameters include \(\alpha , \beta , \delta \), with the \(\gamma Efficiency_{imy}\) term dropping out when first differences are taken during GMM estimation. Note that the variable \(Efficiency_{i}\) in Eq. (1) is equivalent to variable \(Efficiency_{imy}\) in Eq. (3).

$$\begin{aligned} \overline{Temp}_{imyh}\!=\! & {} \alpha \!+\! \beta \overline{Temp}_{imy(h-1)} \!+\! \gamma Efficiency_{imy}\nonumber \\{} & {} + \delta \overline{Weather}_{myh} \!+\! \lambda h \!+! \theta _{imy} \!+\! \nu _{imyh} \end{aligned}$$
(3)

where h is a set of hour dummies, which accounts for correlations across unit of analysis (Roodman, 2009).Footnote 9\(\lambda \) is a vector of parameters for the set of hour dummies. \(\theta _{imy}\) is unobserved property specific effects, while \(Efficiency_{i}\) is observed time-invariant. \(\nu _{imyh}\) is the error term. The description of the variables is similar to Eq. (1) except the values are the monthly means at each hour.

The second stage entails estimation of the time-invariant parameters to retrieve the \(\gamma \) parameter from Eq. (3). To do so, we regress the composite residuals from the first stage, \(\hat{u}_{imyh}\), on the observed time-invariant variables (i.e. \(Efficiency_{imy}\)), as illustrated in Eq. (4). Since we are looking at the effect of the physical building, by excluding occupants’ behavioural effects, we assume that \(Efficiency_{i}\) is uncorrelated with unobserved property specific effects, \(\theta _{imy}\), or the error term for the second stage estimation, \(\omega _{imyh}\).

$$\begin{aligned} \hat{u}_{imyh}= & {} \overline{Temp}_{imyh}- \hat{\alpha } - \hat{ \beta } \overline{Temp}_{imy(h-1)}\nonumber \\{} & {} - \hat{\delta } \overline{Weather}_{myh} - \hat{\lambda } h = \gamma Efficiency_{imy} \nonumber \\{} & {} + \theta _{imy} + \omega _{imyh} \end{aligned}$$
(4)

In the Difference GMM approach, lagged levels are weak instruments if the coefficient on the lagged variable is close to one (Arellano and Bond, 1991), which is the case in this empirical application. Hence, we implement System GMM with a two-step estimator. Pooled ordinary least squares (OLS) and panel fixed effects specifications are commonly estimated for comparison. While both these estimators are biased and inconsistent due to the correlation between the composite error terms and lagged indoor temperature, their estimates bound the true value. In the OLS regression, the lagged temperature is positively correlated with the disturbance terms and provides a coefficient that is biased upward, whereas in the fixed effects regression, the is biased downward due to the negative sign on the transformed error.

Individual property level estimates

Our third estimation strategy entails estimating temperature equations at the individual property level, as specified in Eq. (5). The objective of this approach is to illustrate the heterogeneity of building performance across the BER scales in contrast to the point estimates from the prior two approaches. With the building fabric constant within individual properties, our focus moves to temperature inertia. Within a single property, \(\hat{\beta }\) provides an estimate of how much heat, using temperature as a proxy, is retained within the building fabric after 1 hour’s time has elapsed while the heating system is turned off. In a property that is not being actively heated, one would anticipate \(\hat{\beta }<1\), with estimated values declining as energy efficiency declines. We utilise the same dynamic panel estimator as previously (i.e. Roodman, 2009), with the hour index (h) as the time dimension (i.e. early morning hours, \(1\le T \le 5\)) and the panel dimension represented by the number of days over which data is available (\(1\le N \le 181\)). \(\hat{\beta }\), therefore, represents an estimate of the average temperature inertia within a property. \(\upsilon _{hdmy}\) is the error term. We plot kernal densities of \(\hat{\beta }\) associated with each BER scale to illustrate both the heterogeneity of temperature inertia for a given BER rating and how the densities differ across BER scales. Kolmogorov-Smirnov tests are utilised to test equality of the estimated distributions.

$$\begin{aligned} Temp_{hdmy}= & {} \alpha + \beta Temp_{(h-1)dmy}\nonumber \\{} & {} + \delta Weather_{hdmy} +\upsilon _{hdmy} \quad \forall i \end{aligned}$$
(5)

Results

Standard panel data estimates

Main results

We begin presenting the estimates for a standard random-effects panel model. Table 4 presents the parameter estimates for Eq. (1), with several alternative specifications included. The main model specification is reported in column (1).Footnote 10 The coefficients associated with the BER scales have a negative sign, indicating a decrease in temperature, a proxy for a building’s heat loss, compared to the reference category of A3–B3 rated properties. With the focus being on building fabric performance, the estimated results demonstrate that building fabrics with enhanced insulation (lower thermal transmittance) exhibit improved heat retention. The absolute value of the coefficients is broadly increasing in magnitude as the BER scale value moves from A to G, with the exception of F- or G-rated properties. Only for properties rated C3 and below are the coefficients statistically different than zero. The magnitude of temperature decline is greater among the least energy-efficient properties, as one would anticipate. However, the gradient of performance decline is much less than one would anticipate. For instance, the decline in indoor temperature for D1-rated properties relative to A3–B3 properties is 0.12°C, while the point estimate detailing the decline in indoor temperature for E-rated properties is only marginally greater, at 0.15°C. BER categories of C1 to C3 relative to the A3–B3 reference category have estimates that are either of a relatively small magnitude difference or are statistically insignificantly different. BER categories of D1 or worse tend to have significant differences of a relatively greater magnitude.

Table 4 Estimates from a standard panel data model

Contrary to expectation, the magnitude of the coefficient on the F- and G-rated properties is not the greatest in absolute value. F- and G-rated properties have the lowest assessed level of energy efficiency. This result is potentially a reflection of the composition of the F- and G-rated properties in our sample. Over 83% of F- and G-rated properties had their BER assessment completed in 2014 or earlier. Also, two-thirds completed their BER assessment for the purpose of selling the property. Given the length of the intervening period and the likelihood that the properties were renovated subsequent to sale, there is a strong possibility that the BER ratings of some properties in this category are no longer valid. Given the overall number of F- or G-rated properties in the sample, at just 54, it is likely that any renovated properties will have a substantial impact on the coefficient estimate. Irrespective of the point estimate for F- and G-rated properties, the conclusion from the regression estimate remains that a significant difference remains, relative to an A3–B3-rated property, of a magnitude that is similar to D1–E2-rated properties.

While the energy efficiency parameter estimates are of primary interest, the coefficient estimate on lagged temperature (\(\beta \) in Eq. 1) is also noteworthy. The estimate at 0.91 indicates that in the absence of heating, the indoor temperature at any hour will be approximately 90% of the temperature level an hour earlier with other factors such as thermal efficiency and external weather accounting for the balance. The hourly indoor temperature rises with outdoor temperature and humidity, but it declines as wind speed increases.

Sensitivity analysis

To investigate the robustness of the model estimates reported in column (1), the same model specification is re-estimated for various sub-sample categories and reported in columns 2–4 of Table 4. The pattern described above remains broadly the same: BER categories of C1 to C3 have estimates of practically negligibly different from the A3–B3 reference category. In properties rated D or lower, the estimated differences are of greater magnitude.

These sensitivities were chosen to rule out any possible confounding factors influencing our analysis. In column (2), properties where the BER assessment was completed after December 2019, which is the starting point for the smart thermostat data in our analysis, are excluded. The rationale for this is that recently assessed homes may have had an energy-efficiency renovation during the period of the smart thermostat data collection. Excluding these observations precludes this situation. In this instance, the sample drops to 492 properties. Broadly, the estimates are similar to those in column (1) though the coefficients on the BER variables have roughly doubled in magnitude. The largest coefficient, on E-rated properties, is \(-\)0.25 relative to \(-\)0.15 in column (1). The pattern observed in column (1) prevails: there is a statistically significant drop in temperature across the BER scales relative to the reference category, with the difference growing as rated energy efficiency declines subject to the same caveat for F- and G-rated properties. The differences among grades C3 or lesser are less than the differences among grades E1 or greater, although the distinction is less clear in this specification.

The results in column (3) exclude properties where the BER assessment was for retrofit grant support from December 2019 onward. BER assessments for grant applications occur after renovation works are completed. In the case where the BER assessment occurred from December 2019 forward, it is possible that the smart thermostat data could cover both before and after the retrofit work. The coefficient estimates on the BER scales in absolute value are somewhat greater than those in column (1) but less than those in column (2). The pattern from column (1) emerges once again: BER categories of C1 to C3 tend to have either insignificantly different degrees of performance or significant differences of relatively small magnitude. BER categories of D1 or worse tend to have significant differences of a relatively greater magnitude.

Table 5 First stage regressions for GMM models
Table 6 Second stage regressions for GMM residuals
Fig. 2
figure 2

Density of estimates of lagged temperature coefficients at individual property level

The purpose of some BER assessments is for the sale of the property. New property owners often undertake renovation works, some of which could change the energy efficiency status of the property. For instance, the likelihood of fuel system upgrades is much higher when occupancy changes (Curtis and Grilli, 2021). It is feasible that renovation works were completed, but an updated BER assessment was not undertaken and registered. In such circumstances, the BER rating linked to the smart thermostat data might not reflect the true BER status of the property. The results presented in column (4) exclude all properties where the BER was undertaken for the purpose of selling the property. The coefficient estimates on the BER scales in this instance are broadly similar to those in column (1), though only three of the BER coefficients are statistically different than the reference category. Nevertheless, the pattern observed in column (1) prevails once more, with the lesser performing group extending to include D1-rated dwellings. We see in column (4) that BER categories of C1 to D1 tend to have either insignificantly different degrees of performance or significant differences of relatively small magnitude. BER categories of D2 or greater tend to have significant differences of a relatively greater magnitude.

While there are some small differences in the coefficient estimates across columns 1–4, they are broadly similar. Focusing on the BER coefficient estimates, those in column (2) are the largest in magnitude, but still, the hourly drop in temperature is less than 0.25°C irrespective of BER rating relative to the most energy efficient A3–B3 rated properties within the sample. It is feasible that secondary heating sources (e.g. open fire, plugged electric heaters) operate for some time after the main heating system (gas or oil boiler) is turned off. To account for this, we re-run the same model specifications but restrict our analysis to hours after 2:00 a.m. when the likelihood of secondary heating sources is even less likely. Results are reported in the Appendix in Table 7 and are broadly the same as those in Table 4. Several other models were estimated based on various sub-samples, for example weekend or weekdays, and excluding cases of high (> 25°C) or low (< 15°C) temperatures, with parameter estimates broadly similar to those reported here. The robustness of the estimates across the different samples highlights that neither retrofits undertaken within the analyzed period nor properties with typical heating profiles have a disproportionate impact on the estimates.

Arellano-Bond type dynamic panel estimates

Column 3 in Table 5 presents the first stage System GMM estimates, with the OLS and fixed effects estimates provided for comparison as noted earlier. Also reported in Table 5 are tests that determine validity of the GMM models, including a first and second order serial correlation tests and a Hansen test of over-identifying restrictions. The AR(1) test indicates the presence of first-order correlation in the residuals, supporting the argument that the error terms contain unobserved property specific effects. The AR(2) tests fail to reject the null hypothesis that the difference errors in period ‘h’ and ‘h-2’ are uncorrelated, indicating that a second lagged value is a valid instrument. Also, Hansen’s test statistic indicates the validity of the instruments.

Table 6 presents the results of the second-stage regressions for GMM residuals. The relative patterns of temperature decline across the BER scales broadly match that of the standard random-effects estimations in Table 4, albeit the estimated coefficients from the System GMM exhibit a greater magnitude. The negative estimated coefficients on the dummies for BER scales indicate temperature declines relative to the A3–B3 (reference category). BER categories of C1 to C3 tend to have either insignificantly different degrees of performance or significant differences of relatively small magnitude. BER categories of D1 or worse tend to have significant differences of a relatively greater magnitude. Indeed, the difference between relatively high (i.e. C1–C3) and relatively low-performing (i.e. D1–G) BER categories is more pronounced when assessed using the Arellano Bond-type estimator. Columns 2–4 comprise estimates based on different sub-samples of our data, similar to those discussed earlier in the sensitivity analysis in the ‘Sensitivity analysis’ section. In terms of magnitude, the estimated parameters from the GMM residuals are larger than those from standard random-effects, with E-rated properties showing a relatively large decline (a mean of 0.62°C drop per hour relative to the default category).

Individual property level estimates

The estimates at individual property level are presented graphically in Fig. 2. The distribution of the coefficients on the lagged indoor temperature is shown with separate density plots associated with each BER category. This provides insight into the heterogeneity of performance within BER categories.

Figure 2 clearly demonstrates that there is greater within-BER heterogeneity than between-BER heterogeneity. While the greater majority of properties have estimated coefficients in the range 0.8–0.95, there are many properties with estimated coefficients below 0.8. Ex-ante one would have anticipated a clearer difference in the mean performance between property types. However, there is no distinct pattern when observing these plots, further emphasising the findings of the preceding analyses. In addition, Kolmogorov-Smirnov tests fail to reject equality of distributions between each of the BER scales. Similar results arise when the samples are restricted in a similar way to those discussed in the sensitivity analysis in the ‘Sensitivity analysis’ section.

Discussion

Two striking results emerge from our analysis. First, we find the slight variation observed between different BER categories is relatively outweighed by the variations within BER. Second, we observe a lower-than-anticipated gradient of performance across different BER categories.

Table 3 and Fig. 2 show that the mean performance of properties across BER scales is broadly similar and is overshadowed by within-category variance. This suggests that factors other than BER have an overwhelming influence on building fabric performance. Our analysis considers temperature changes in the main living space, which may vary considerably between dwellings with different BER scales. However, there is no reason to believe that there is a systematic difference in the distribution of these factors across BER categorisations and a difference in mean performance should still prevail. A substantial share of properties across all BER ratings performs relatively strongly in terms of temperature inertia, while another substantial share of properties across all BER ratings performs relatively poorly.

While BER may be a good standardised approach to measure potential performance across properties, these results suggest that there are additional factors to be considered when evaluating energy use within the home. From national energy statistics, we know that fossil fuel use per household has declined by more than 28% since 2002. This presumably can be attributed to an extensive program of residential energy retrofits plus higher building standards. Results from this and similar papers in the literature (e.g., Coyne and Denny, 2021) provide evidence to suggest that relying on theoretical energy performance certificate data may lead to misspecification of actual energy performance in the home. This insight was achieved through the use of ex-post data analysis, both in the case of this paper and that of Coyne and Denny (2021), motivating the incorporation of such data into a more comprehensive energy performance evaluation going forward. In line with this, recent studies are utilising data-driven approaches (e.g. Amasyali and El-Gohary, 2018, Bourdeau et al., 2019, Mutani and Todeschi, 2021, Pasichnyi et al., 2019, Wenninger and Wiethe, 2021) in order to improve the prediction accuracy of EPCs and overcome the limitations associated with projections from engineering models.

This paper also finds that the performance gradient between BER categories is less than expected ex-ante. While previous research has demonstrated how energy retrofits within the Irish housing stock lead to a reduction in energy consumption (Beagon et al., 2018; Coyne et al., 2018; Rau et al., 2020), none of these studies examine the gradient of performance between BER scales. Broadly consistent with the results here, Coyne and Denny (2021) find a lack of variation in average metered energy use across BER categories among 10,000 Irish properties and conclude that energy demand is unresponsive to the energy efficiency rating of properties. The Irish building energy performance standard, BER, is consistent with EU guidance, and similar differences between theoretical and actual residential energy performance have been identified elsewhere (Majcen et al., 2013; Van den Brom et al., 2018; Cozza et al., 2020).

Our research reveals that the energy performance gap persists even after excluding the influence of occupants’ behaviour, which is widely recognised as a significant factor contributing to the disparity between projected and actual energy consumption (e.g. Aydin et al., 2017, Fowlie et al., 2018, Gillingham et al., 2020, Sorrell et al., 2009, Sunikka-Blank and Galvin, 2012). Our findings support the argument that, apart from occupants’ behaviour, there are additional factors that contribute to the energy performance gap in buildings. For instance, the use of nationally specified default thermal transmittance values in the absence of actual data leads to an overestimation of energy savings resulting from refurbishments (Ahern and Norton, 2020; Raushan et al., 2022). This is particularly evident as these default values (worse-case default values) are often drawn from building codes and regulations during the time of construction and do not consider significant building fabric upgrades in older dwellings (Ahern and Norton, 2020). Furthermore, challenges arise due to variations in assessors’ approaches (e.g. Christensen et al., 2021, Crawley et al., 2019, Hardy and Glew, 2019) and discrepancies in methods and data input quality (Semple and Jenkins, 2020; Li et al., 2019).

The research presented in this paper, as well as the findings of Coyne and Denny (2021), suggest that achieving a policy target of retrofitting 500,000 properties to a B2 BER standard (CAP, 2021) may not necessarily lead to the same degree of energy savings as predicted ex-ante. This has important implications for the efficient allocation of public funds, with €8 billion earmarked for residential energy retrofits in Ireland (CAP, 2021). The findings of this paper and others in the literature suggest that there are considerable deficiencies in the design of energy performance certificates, with scope for greater emissions reduction per unit of funds spent through a more representative measure of energy performance.

Further efforts such as improving the features of BER to more accurately reflect the actual energy usage and insulative properties of building materials may be necessary to enhance the effectiveness of BER. Achieving this would involve integrating real-time or historical energy usage data into model projections. This should provide a better representation of a building’s energy consumption instead of relying solely on standardised values for factors such as the number of occupants and energy use schedules. In addition, it is important to consider replacing unrealistic worst-case default values with values that are representative of actual dwelling stocks. This can be achieved auditing assessors who frequently depend on default values and ensuring transparency in the calculation of these values (Raushan et al., 2022). These measures will ensure that the values used in the assessments truly reflect the characteristics of the buildings under evaluation.

It is likely to be practically and administratively difficult to design and implement a subsidy scheme that is directly linked to improved energy and emissions performance. What is more feasible is the development of national surveys with appropriate samples and statistical analysis to understand the relationships between energy efficiency standards, energy retrofits, energy use, and occupant use and behaviours. With more comprehensive information, retrofit grant schemes can be regularly reviewed to ensure the most efficient use of public funds.

Fig. 3
figure 3

Mapping the BER scales (A1-G) across BER in kWh/m2/year of Irish homes

Fig. 4
figure 4

Illustration of data used in the analysis

Conclusions

Energy performance certificates are widely used as a benchmark of performance against which residential investment in energy efficiency is measured. Indeed, energy performance certificates form the basis of national programs of energy efficiency in order to meet climate targets. While energy performance certificates do not purport to be a projection of occupants’ actual energy usage, they are used as the basis for public policy.

Our study employs an innovative approach to examine the heat retention and heat loss characteristics of buildings with different energy performance certificates. By using a data-driven method, indoor temperature change as a proxy variable for heat loss, and excluding occupant behaviour, we provide insights into the effectiveness of buildings’ energy performance certificates in terms of heat retention and the energy performance gap attributable to discrepancies in building fabric performances. Our results support earlier findings by Coyne and Denny (2021), who also fail to find a distinct gradient in performance between BER ratings, lending evidence to suggest that BER is not as strong an indicator of building fabric performance as one would expect ex-ante. In addition, we find that there is a wide heterogeneity of building fabric performance within BER grades, to the extent that this is far greater than between-BER heterogeneity.

Two key policy implications follow from this research. Firstly, more research is required to improve our understanding of the relationship between energy efficiency standards, energy use, and occupant behaviour. Using national surveys with appropriate samples combined with data from smart meters, data loggers, and other devices controlling heating systems, a substantially better understanding of energy use is feasible.

Secondly, many national policies frame energy efficiency objectives relative to a particular energy performance standard, as measured by energy performance certificates. While energy efficiency retrofits will invariably reduce residential energy use, this research finds that the Irish energy performance certificate captures a relatively small degree of total heterogeneity in energy use. Occupant behaviour is widely attributed as contributing to the discrepancy between projected and actual energy consumption, but this research finds that factors beyond occupant behaviour are responsible. For instance, the use default thermal transmittance values in BER assessments may play a critical role (Raushan et al., 2022). Consequently, directly linking policy targets to a given energy performance certificate standard may lead to energy use and emissions outcomes substantially different than envisaged.