Buildings account for 40% of EU energy use (European Parliament and Council 2010) and it is projected that 75–90% of the EU building stock will still be standing in 2050. Therefore, the improvement of buildings’ energy efficiency (EE) is among top priority measures that can contribute to meeting EU’s commitment to reduce energy consumption and greenhouse gas emissions.Footnote 1 From the perspective of mortgage lenders and investors, who have shown growing interest in “green”, “sustainable”, and “energy-efficient” products in recent years, investing in building performance improvements seems to be an attractive market segment.Footnote 2 Support comes from studies around the world documenting that homebuyers and commercial investors recognize the contributory value of increased energy efficiency. Both groups of buyers demand a greater discount for less energy-efficient buildings and price energy certifications into property values.Footnote 3

While the positive relationship between EE and sales prices is well documented, it is less obvious whether EE has an impact on borrower credit risk. Understanding the importance of EE to mortgage default is critical. If EE provides significant information to predict mortgage default, then banks or other financial intermediaries (such as fintech companies) should incorporate this information into their credit scoring models. With this paper, we aim to shed light on this aspect by focusing on the residential mortgage market and examining the correlation between EE and the probability of default (PD).

There are three potential channels that might drive the results: (i) personal characteristics of the borrowers captured by the choice of an EE building (e.g., environmental consciousness); (ii) improvements in building performance that help free up a borrower’s disposable income through lower utility bills and thus reduce default risk; and (iii) the positive effect on the dwelling value and thus on the loan-to-value ratio (LTV), which lowers default risk. Following the findings of An and Pivo (2020), we address the latter channel by controlling for contemporaneous LTV. Regarding the former two channels, our findings suggest that energy efficiency might play a significant role with regard to default risk either due to unobservable borrower characteristics or the additional cash flow from energy savings.

We seek to contribute to this strand of research by using a loan-level data set that we combine with information on building energy efficiency and document whether there is indeed evidence of a reduction in default rates for borrowers living in energy-efficient buildings and whether this reduction is greater for lower-income borrowers.

We use loan-level data from the Dutch mortgage market and investigate the relationship between the energy efficiency of a building and the probability of mortgage default. Focusing on residential buildings (residential detached/semi-detached houses, apartments and terraced houses), our sample consists of mortgages issued on more than 120,000 dwellings. We complement the data set with provisional energy efficiency ratings assigned by the Netherlands Enterprise Agency (Rijksdienst voor Ondernemend Nederland, in short RVO) to all Dutch buildings that have not yet been assigned the actual energy performance certificate (EPC) rating. RVO provides rating categories for 60 pairs of different building type and construction period combinations in the Netherlands. This allows us to match the loan data with EPC ratings according to building type and construction year.

We disentangle the energy efficiency component from building type and age specific effects, which are typically associated with borrower default risk, by exploiting three aspects of the data set. First, because of the panel structure of the data covering the period January 2014 to May 2018, we can construct a time-varying building age category variable that can be used as a control for age while still maintaining variation in the EE variable within specific age categories. Second, the data cover the year 1992, which is the year in which the Dutch Building Decree (Bouwbesluit) came into force requiring better roof and floor isolation for newly built dwellings. This legislative order led to an immediate improvement in EPCs across all building types, so the improvement in EE can be interpreted as a consequence of technological progress rather than a linear function of time. Third, we take advantage of the fact that some EPC ratings change asynchronously over time across the different building types, i.e., for some building types the EPC rating improves earlier than for others. This leaves some degree of variation that cannot be captured exclusively by type or year of construction.

We employ two empirical methodologies – logit regression and the extended Cox model – and find that energy efficiency is negatively related to a borrower’s likelihood of defaulting on mortgage payments. The results hold when we account for borrower, mortgage, and market control variables. The findings survive a series of robustness checks. This implies that the energy efficiency rating picks up previously unobserved building and borrower characteristics and/or cash savings that are not captured by the usual variables considered for credit scoring.

Furthermore, we show that buildings’ EE better mitigates the default risk of lower-income borrowers. Therefore, we provide evidence for the relevance of the economic channel (on top of the personal characteristics), i.e., that savings coming from reduced costs (i.e., energy bills and insurance costs) might have an impact, at least for borrowers with less disposable income.

In the remainder of this paper, we first provide a more detailed account of the recent developments in the energy rating landscape and present related literature in “Background and Related Literature”. In “Data, Energy Efficiency Definition, and Statistics”, we explain the construction of the data set and present relevant descriptive statistics. In “Methodology”, we outline the methodology. “Empirical Results” discusses the results and “Conclusion” concludes.

Background and Related Literature

This section places this work in context with the historical development of the energy efficiency assessment landscape and previous studies related to energy efficiency as well as the associated findings.

Historical Background on Energy Efficiency Ratings

Over the last three decades, the building sector has experienced rapid growth in the implementation of energy-efficient building technologies. Nevertheless, the current renovation rate for energy efficiency in Europe is about one percent per year, which means that a timely transition to the legally binding target of net zero greenhouse gas emissions by 2050 cannot be guaranteed at this level (Economidou et al. 2019).

In order to make such improvements comparable between different buildings, the energy efficiency components of a building need to be measured, evaluated, and combined into an easily interpretable indicator, i.e., a rating. Currently, the landscape of rating systems is quite diverse. In the United States, for example, various energy efficiency certifications coexist and compete with each another. In Europe, on the other hand, the energy performance certificate is well known but the information inherent in it varies from country to country. In Germany, for instance, two definitions, an energy-consumption and an energy-demand perspective, co-exist under the same EPC label (see Weiss et al. 2012). This provides a challenging research environment for the question at hand: what is the relation between building energy efficiency and mortgage default risk? The answer to this question has the potential to unlock benefits for borrowers, lenders, and investors alike.

In the United States, the history of energy efficiency labels dates back to the early and mid-1980s, when Alaska and California took the first steps towards improving efficiency and affordability of housing in the United States (see Farhar et al. 1997). About a decade later, in 1995, the non-profit organisation Residential Energy Services Network (RESNET) took the initiative to develop the Home Energy Rating System (HERS)Footnote 4, and the governmental Environmental Protection Agency (EPA) introduced the ENERGY STAR certification programFootnote 5 for newly constructed single-family homes. During the same time, the government-owned National Renewable Energy Laboratory (NREL) initiated a pilot program that would introduce a new financial product called the “energy-efficient mortgage” and link it to a building’s energy efficiency. Once the mortgages were distributed, the next step was to evaluate the program. The evaluation phase intended, among other goals, also to analyse to what extent a link between buildings’ energy efficiency and the mortgage probability of default exists (see Farhar et al. 1997; Farhar 2000). The results from this analysis would have been the first of their kind. However, the study was either not conducted or not published. The reasons for this remain unknown. In the years since, none of the published energy efficiency reports have been able to provide a thorough analysis either. Problems with data availability were reported as the main reason for this research gap (see, e.g., Hammon 2005).

In Europe, Denmark and the United Kingdom were among the first countries to conduct energy efficiency assessments of buildings in the 1970s and 1980s, respectively. In the early and mid-1990s, various European countries introduced mandatory energy efficiency requirements, which were accompanied by the development and implementation of corresponding rating systems. To name a few, in the UK, BREEAM (Building Research Establishment Environmental Assessment Methodology) and NHER (National Home Energy Rating Scheme) were both introduced in 1990. In Ireland, ERBM (Energy Rating Bench Mark) was created in 1992, while in the Netherlands the energy performance of buildings has been measured since the mid-1990s. In 2002, the EPC was introduced as a requirement for the member states of the European Union by the Energy Performance of Buildings Directive (see European Parliament and Council 2002). As a result, all member states and some other European countries have established national building rating policies during the past two decades. However, despite these initiatives, the use of European energy rating information for research on the financial performance of real estate is rather rare. This paper is among the first attempts to shed light on this issue.

Literature on Energy-Efficient Buildings and Mortgage Default Risk Drivers

In the traditional loan origination business, the default risk of applicants for consumer and mortgage loans is usually assessed through the use of credit scores. These scores are the result of a statistical model that maps an applicant’s characteristics to the probability that he or she will default on the loan. The lender uses the prediction of an applicant’s likelihood of default to determine the volume of credit granted and the interest rate charged. Typically, the input variables of a statistical model include behavioural, financial, and demographic information. These are usually supplemented by loan-specific characteristics, such as the LTV ratios for mortgage loans. Credit scoring methods are continuously refined, either by introducing new models or by adding new variables or characteristics.

An important question for both practitioners and researchers is whether the inclusion of the mortgage-specific attribute “energy-efficient” or “green” in the lender’s scoring model adds value. The theoretical motivation is that mortgages issued on energy-efficient houses should have lower risks relative to less efficient houses. The argument for this reasoning is that borrower’s savings from energy use will result in more disposable income in case of emergencies or unexpected events. Burt et al. (2010) argue that house ratings can accurately predict annual energy costs, which should translate into lower default risk. That is, energy efficiency frees up a part of the borrower’s income, which improves the ability to repay debt.

However, actual research on this topic is limited. To date, only a few studies have been conducted on this research area, and all of them rely exclusively on residential and commercial mortgage data from the United States. We contribute to this strand of literature by focusing on the Dutch residential mortgage market.

A recent study on the relationship between energy efficiency and the probability of default of residential mortgage loans was conducted by Kaza et al. (2014). The authors use information on about 71,000 loans originated between 2002 and 2010 in the United States for owner-occupied single-family homes.

Their findings show that ENERGY STAR certified dwellings are associated with substantial and significant reductions in default and prepayment risk. The authors argue that the lower risk associated with energy efficiency could be due to energy savings or simply because borrowers with energy-efficient homes are more financially better off than those with less efficient homes.Footnote 6

To date, the literature has examined the relationship between energy efficiency and mortgage default risk without analyzing the economic channels. A notable exception is the work of An and Pivo (2020) on commercial buildings. The authors examine the relationship between energy-efficient buildings that carry an ENERGY STAR or LEED label and the corresponding commercial mortgage default risk. The underlying loan sample consists of approximately 6,300 commercial mortgages originated between the years 2000 and 2013 in 17 Metropolitan Statistical Areas across the United States. The results show that traditional default predictors do not fully reflect the financial benefits of energy efficiency, suggesting that loans on ENERGY STAR or LEED certified buildings are 34% less likely to default than their non-certified counterparts. The authors provide evidence of a green premium that reduces the default risk of energy-efficient buildings in two ways: a green rent component that increases the mortgage debt service coverage ratio; and a green value component that increases the owner’s equity position and reduces LTV. Both DSCR and LTV are negatively related to default risk. In our paper, we instead consider residential mortgages and the role of household income in addition to mortgage variables.

In addition to using energy efficiency characteristics alone, studies have shown that buildings with higher sustainability ratings are also less susceptible to default risk. By analysing data sets on residential and multifamily homes, Rauterkus et al. (2010) and Pivo (2014) find that sustainability features such as building location, transportation access (e.g., closeness to freeways, subways, work) or housing affordability also play an important role in borrowers’ ability to repay their debt.

Other studies focus on personal characteristics that are more oriented on the choice of energy-efficient alternatives without considering them as potential risk drivers in the relationship with mortgages. In fact, in most cases, personal characteristics, such as beliefs and attitudes, are not available and are therefore considered hidden. Black et al. (1985) emphasize that the effects of sociodemographic variables on energy efficiency-oriented behaviour are in most cases indirect and influenced by social norms (awareness of social consequences of energy efficiency), moral norms (concerns about the energy problem), and personal norms (perceived self-interest such as energy bill savings). More recently, Busic-Sontic et al. (2017) have shown that three out of the five personality traits, such as openness to experience, extraversion, and agreeableness, are significant predictors of EE investment.Footnote 7 Finally, Lange and Dewitte (2019) show that cognitive flexibility can explain differences in pro-environmental behaviour.

To summarize, current literature on the direct relationship between energy efficiency and default risk is sparse and focuses on the U.S. housing market. Moreover, only the study of Kaza et al. (2014) employs residential mortgage data to investigate the impact of energy efficiency. The presented results are supportive of a significant and inverse relationship between energy efficiency and mortgage default risk. In this paper, we contribute to this relatively recent strand of literature by focusing on the Dutch residential mortgage market and confirming that EE building standards are related to lower mortgage default rates. We therefore suggest that also for Europe, there is a relationship between EE ratings and default rates that should be incorporated into lenders’ risk assessment models. We also provide evidence that the EE rating not only subsumes potential unobservable borrower characteristics, but also captures the effect of lower annual energy costs for lower-income borrowers on their economic ability to repay mortgage debt.

Data, Energy Efficiency Definition, and Statistics

This section details the construction of the data set used in the analysis. The steps include (i) the selection and aggregation procedure of the loan-level data, (ii) the definition of energy-efficient building ratings, and (iii) the methodology of merging the two data sets. In addition, we present the selection of variables for the analysis and the respective summary statistics.

Data and Sample Selection

In the following analysis, we employ Dutch mortgage data obtained from the European DataWarehouse (ED).Footnote 8 ED provides a comprehensive data set with periodically updated dynamic and static individual loan-level information on securitized European mortgages.Footnote 9 We restrict the data sample according to the following criteria. The sample period covers January 2014 to May 2018 and the country of assets is limited to the Netherlands. The type of borrower is “individual” and the primary income is between EUR 20,000 and 1,000,000. The property type is “residential detached/semi-detached house”, “apartment”, or “terraced house”. The building’s occupancy type is restricted to “owner-occupied” and the construction year of the buildings is between 1900 and 2016. In addition, we focus only on fixed-interest rate mortgages and exclude repurchased ones. Finally, we require that each borrower be associated with exactly one building and vice versa. Appendix A provides an overview of the variables selected for the analysis.

After applying the above selection criteria, our final data set totals 273,024 individual mortgage components that are associated with 127,309 individual buildings. The discrepancy between the number of mortgage components and the number of underlying buildings results from the Dutch-specific tax treatment of mortgages. A typical Dutch mortgage loan consists of multiple loan parts, e.g., a bank savings loan part that is combined with an interest-only loan part. This is more common for mortgages originated prior to 2013 when there was a specific tax preference for interest-only mortgages. Besides the tax reasons, the number of mortgage components may go beyond two if a borrower takes out an additional mortgage on the same building at a later date.

For the analysis, we aggregate loan component information at the building level. For certain variables, this is already done by the data provider. For example, variables such as LTV or debt-to-income (DTI) are available at the borrower level (i.e., the same value is reported for each loan component). Where necessary, we compute for each building the average variable value across loan components weighted by the loan component’s original balance. “Choice of Variables and Summary Statistics” provides further details on this aggregation procedure.

Defining Energy Efficiency

For the classification of buildings into different energy efficiency categories, we rely on the Dutch energy performance reference table prepared by the RVO.Footnote 10 The intention behind this table is to determine a provisional energy label for all existing Dutch residential buildings. The provisional EPC indicates the energy performance of a reference building that was developed using cadastral data (i.e., area, date of construction, building type, quality of insulation of floors, roof and walls, and systems for heating, hot water, and renewable energy) of the Dutch residential building stock. The dwelling owners are encouraged to modify or add additional information about energy improvement measures, which must be approved by a qualified expert before being posted on the website. In this respect, owners must also provide evidence of the measures carried out, such as invoices and photographs. The qualified expert reviews the uploaded changes and documents before approving the definite EPC. Based on this approval, the provisional EPC is finally replaced by the actual, new EPC, which is registered at RVO. This final EPC is based on a national calculation method that takes into account the retrofit measures carried out by the owner of the dwelling.

In the provisional rating table, the label classes are calculated as described in RVO (2014). This document relies on studies conducted on the Dutch residential market, including Boumeester et al. (2008), Agentshap NL (2011), and Agentshap NL (2013). For the provisional label, the RVO has drawn up 60 reference buildings that serve for determining the provisional energy label.Footnote 11 The ordinal rating scale ranges from G (lowest energy efficiency) to A (highest energy efficiency). For each type of dwelling and year of construction, the most common characteristics of the house were studied in terms of flooring, roof, heating and ventilation system, presence of solar panels, etc., and each characteristic was assigned a code. Then, based on these properties, the approximate energy consumption of the reference dwelling was calculated and a corresponding energy label between G and A was assigned.

It should be noted that the provisional label assignment procedure is not as optimal as the assignment of actual EPC ratings. Unfortunately, the actual EPC ratings are not available. Therefore, a measurement error may be present. The measurement error can occur in two ways: randomly or systematically. If the error is random, then it should be averaged out due to the size of the data set.Footnote 12 If the error is systematic in nature, then it should be attributed to one of two possible sources: either (i) the energy efficiency of buildings systematically improved after construction (e.g., due to homeowner participation in large-scale retrofit programmes) or (ii) it deteriorated. We can rule out systematic rating deterioration because it is irrational for a large base of homeowners to deliberately reduce their buildings’ energy efficiency or to neglect building maintenance. On the other hand, a systematic improvement in energy efficiency is a legitimate concern.Footnote 13 However, if actual energy ratings improve after a building was constructed, then our results should be placed at the lower end of the estimation spectrum, meaning that our findings are likely to underestimate the true relationship between EE and PD. Similar to other studies that have only EPC ratings available, there is a presence of uncertainty regarding actual energy consumption. Namely, buildings with the same energy rating do not necessarily have the same gas or energy consumption, which depends on the individual consumption patterns of borrowers. Consequently, consumption behaviour usually is treated as idiosyncratic.

Table 1 provides an overview of the final energy classification for the analysis. It is evident that energy efficiency has improved over time, with the most efficient buildings built after 2006. Furthermore, it should be noted that the construction year periods are not of equal length. This means that energy efficiency improvement is not a linear function of the year of construction. It is technological progress and legislation (and not simply time) that are the driving factors determining rapid energy efficiency improvement during certain periods. In particular, the Building Decree (Bouwbesluit), which came into force in 1992, stands out as an important piece of legislation requiring better roof and floor isolation for newly built dwellings. Its impact can be observed in the rating improvements from C to B between the 1988-1991 and 1992-1999 periods. As explained in the next section, we take advantage of the panel structure of the data and define a variable for building age that allows us to capture the EE effect arising due to the rapid improvement in energy efficiency (i.e., technological progress) between the years 1991 and 1992. In addition, we can observe that some ratings do not change simultaneously across property types and construction years. This feature allows us to decouple the energy efficiency component from the construction year and type effect when examining the degree of energy efficiency in the Appendix B.

Table 1 This table presents the energy rating distribution across property types and construction years

Table 2 presents the energy rating distribution of all buildings in the sample, and Table 3 reports the building distribution across Dutch provinces. In both tables, a mortgage on a building is marked as defaulted if at least one of the mortgage components is in arrears for at least for 90 days. In can be observed that C-rated (E-rated) buildings represent the higher (lower) bucket in the sample, while the remaining ratings are distributed reasonably evenly. Column three in Table 2 reports the percentage of defaulted mortgages within each rating category. In this context, the rising share of defaults associated with a lower energy efficiency rating is worth highlighting. Overall, the share of defaulted mortgages is rather low at 0.55%. From Table 3, we can see that the mortgages are not evenly distributed across the Dutch provinces, with the largest share stemming from Holland. Within each province, between one-half and one-fifth of buildings are classified as energy-efficient (i.e., having an A or B rating). Among defaulted loans, the share of energy-efficient mortgages is always lower compared to non-efficient mortgages in each province.

Table 2 This table presents the energy rating distribution of all and defaulted Dutch loans
Table 3 This table presents the geographical distribution of all and defaulted loans according to the NUTS 2 statistical regions of the Netherlands

Choice of Variables and Summary Statistics

The control variables for the analyses are those identified in the existing literature as having a significant impact on mortgage default probability (see An and Pivo 2015). In particular, the variables are chosen in order to account for potential risk mitigating channels that might arise due to energy efficiency. The variables can be categorized into four different types: mortgage, building, borrower, and macroeconomic/financial variables.

Among mortgage variables, we employ contemporaneous LTV, contemporaneous DSCR, contemporaneous DTI, and loan term. LTV and DTI are reported by ED.Footnote 14 The DSCR for each building is defined as the ratio of total monthly income to total monthly periodic constant payments. The latter being the sum of monthly periodic constant payments across all loan components associated with the building. The periodic constant payments were calculated using contemporaneous loan balance, the interest rate, and the number of periods remaining until maturity. The loan term at the building level is defined as the difference between issue date and the maturity date (measured in months) and aggregated as the original balance-weighted average across all loan components associated with that building. Using contemporaneous LTV allows us to control for the potential value channel that might arise due to energy efficiency. As pointed out by An and Pivo (2020), energy efficiency is likely to improve the dwelling’s market value, which in turn should lower the contemporaneous LTV. Chegut et al. (2020) document that Dutch energy efficient dwellings were valued significantly higher in 2015 compared to the baseline year of 2010. This suggests that the reported contemporaneous LTV ratios in our panel data set, which covers the period January 2014 to May 2018, should reflect the market value of buildings’ energy efficiency. Thus, LTV appears to be an appropriate control for the value channel and our findings should therefore be placed on top of the value-induced risk reduction effect. Regarding the income channel, i.e. the improvement in the borrower’s disposable income due to energy efficiency, neither DSCR nor DTI are appropriate control variables because they only capture general household income, but not savings from energy efficiency. Unfortunately, we do not have access to information on borrower energy use that could address this issue. However, in “Economic Mechanism” we examine the income channel by differentiating between high-, middle, and low-income households on top of the classical DSCR or DTI ratio used in the default probability models to capture the additional cash flow due to energy savings.

Building variables include property type, geographic location at the NUTS 3 level, and building age category.Footnote 15 Building age is defined as the difference between the current loan year and the year the building was built. We categorize building age into 3-year categories because, according to Underwood and Alshawi (2000), this is the shortest maintenance cycle of a building. Due to the panel structure of our data, this variable definition allows us to disentangle the building age component from the EE component in the regressions. This is due to the fact that EE variation remains within certain age categories.Footnote 16

Borrower-level information includes total income, defined as the sum of primary and secondary income, and the borrower’s age at origination of the earliest loan component. We categorize the total income across high, medium and low tercile groups. These variables are designed to account for household characteristics that may influence both the decision to purchase an energy efficient home and mortgage default risk. For instance, older, wealthier, and more financially literate households might be more likely to acquire a home with better energy performance while having a low-risk credit profile.

To control for general macroeconomic conditions, we include the quarterly Dutch unemployment rate, the end-of-month 10-year German government bond yields, the monthly standard deviation of 10-year German bond yields, and the yield curve slope, defined as the difference between 10- and 1-year EUR swap rates. The variables are obtained from Bloomberg.

Summary statistics at the property level are presented below. Table 4 provides summary statistics on key borrower, property, and mortgage characteristics as a one-time cross-sectional snapshot using the most recently reported values. The table differentiates between non-defaulted (Panel A) and defaulted (Panel B) mortgages. Within both panels, we also differentiate between energy-efficient (EE = 1) and energy-inefficient (EE = 0) buildings. A dwelling is considered EE if it has an A or B rating. Beginning with borrower characteristics, age at mortgage origination does not appear to differ substantially between EE and non-EE mortgages. However, younger borrowers are significantly more likely to experience default than older borrowers. In terms of income, EE-building borrowers have higher overall total household income for both defaulted and non-defaulted loans, while defaulted borrowers generally have relatively lower annual income. The construction year of buildings varies between EE and non-EE by definition. More recently constructed buildings are EE. About 68% of buildings are detached houses, while 17% are apartments and 15% are terraced houses in the sample (results are not reported in the table). Average interest rates and original LTV are higher for defaulted and non-EE mortgages. In line with the findings of An and Pivo (2020), we observe that on average LTV is lower for energy efficient dwellings, suggesting that EE is associated with higher property value.

Table 4 This table presents the summary statistics of loan and borrower variables for non-defaulted (Panel A) and defaulted (Panel B) loans, respectively

Figure 1 shows the distribution of mortgages according to the buildings’ year of construction (Panel A), the total original balance (Panel B), and the earliest origination year (Panel C). It is worth noting that our data set is well diversified according to buildings’ construction year. In addition, we have a considerable number of mortgages older than ten years. This is an important feature since defaults typically do not occur in the first few years after origination.

Fig. 1
figure 1

Panel A depicts the relative frequency of buildings’ construction year. Panel B depicts the relative frequency of total mortgage original balance that is defined as the sum across all loan components on the same building. Panel C presents the earliest mortgage origination year that is associated with a building.

Unreported statistics on market and economic variables indicate that the average quarterly Dutch unemployment rate for the period January 2014 to May 2018 is about 6.47%. For the same period, the average yield on 10-year German government bonds is 0.46%, their average monthly standard deviation is 0.095%, and the average difference between 10- and 1-year Euro swap rates amounts to 0.963%.


The industry standard for estimating mortgage default risk in loans is to use a logistic regression model. Among the more sophisticated techniques, survival analysis – in particular the use of the Cox model – is a popular alternative approach because it allows to account for the time-series nature of a given data set. Given the panel structure of the data provided by ED, we present and employ both estimation techniques in our analysis.

Logistic Regression

An approach frequently used in the literature to study the relationship between borrower-level loan information and the probability of mortgage default is logistic regression (see, e.g., Altman and Saunders 1997; Episcopos et al. 1998; Qi et al. 2020). Its main difference from the linear regression model is that the dependent variable is a latent variable and that only the binary outcome variable Y, i.e., the default event, can be observed. At a random point in time, Y takes either on a one if the event occurs and a zero otherwise. The probability distribution of Y is modeled as


where Yi is equal to one if at least one of the mortgage components on building i has experienced a repayment delay for at least three months in a row, and zero otherwise. The results of logistic and logit regressions are equivalent since both are obtained through the maximum likelihood estimator. The relationship between the two is that the logistic function is also know as the inverse logit function. For ease of reading, we use the logit model, which does not report estimates as odds ratios. The vector of explanatory variables xi contains the energy efficiency indicator EEi, which is equal to one if a building has a provisional energy rating A or B, and zero otherwise. All other independent variables fall into one of the four categories: mortgage, building, household, or market control variables. A detailed overview of the covariates is presented in “Choice of Variables and Summary Statistics”.

Extended Cox Model

The Cox regression is a survival analysis method that aims to estimate the distribution function f(t) of random event times T, where the event is the default date of a loan component. Following the standard terminology, a generic survival model is represented by a survival function S(t) and a hazard function h(t), which are defined as


where S(t) is a monotone decreasing function in t with the limits \(\lim _{t\rightarrow 0}S(t) =1\) and \(\lim _{t\rightarrow \infty }S(t) = 0\). The survival function models the probability that a loan will survive beyond a threshold period t. The hazard function h(t) represents the instantaneous risk or conditional probability that a default will occur at time t given that the loan has survived up to time t. A convenient way to express the relationship between the survival function S(t) and the hazard function h(t) is to introduce the cumulative hazard function \(H(t) = {{\int \limits }_{0}^{t}} h(u) du\). Using the relationship given in Eq. (3), it is then straightforward to show that \(H(t) = -\ln \{S(t)\}\). The cumulative hazard function can be interpreted as the total amount of risk accumulated up to time t.

Cox (1972) proposes a proportional hazards model with the hazard function being defined as the product of a positive baseline hazard rate h0(t) and the exponential of a linear function of explanatory variables xi:

$$ h(t|\textbf{x}_{i}) = h_{0}(t)\exp(\beta^{\prime}\textbf{x}_{i}), $$

where xi is a vector of time-fixed covariates that are associated with building i and β is a vector of the corresponding regression coefficients. This basic form of the Cox model can be extended to allow for time-varying covariates:

$$ h(t|\textbf{x}_{i}(t)) = h_{0}(t)\exp(\beta^{\prime}\textbf{x}_{i} + \gamma^{\prime}\textbf{x}_{i}(t)), $$

where the vector \(\textbf {x}_{i}(t) = [x_{1}, x_{2}, \dots , x_{p}, x_{1}(t), x_{2}(t), {\dots } x_{q}(t)]\) consists of p time-independent and q time-dependent predictor variables.

The graphical visualization of empirical survival functions can shed light on whether the proportional hazards assumption holds. The empirical survival function is typically depicted using the Kaplan and Meier (1958) method. It is an accepted non-parametric approach that assumes that censoring time is independent of an individual’s behaviour. The empirical function is defined as


where tm are ordered event times and the probabilities being approximated by the frequency distribution in the data set.

In survival analysis, the observation period is typically limited to a defined period of time, so it is important to differentiate between the time (i) when a subject is first at risk, (ii) when a subject is under observation, and (iii) when a subject experiences failure. An individual is said to be left-truncated if the date on which it first becomes at risk precedes the start of the observation period. In the case of loans, left-truncation applies to loans that were originated before the first observation date. And among the left-truncated loans, we can observe only the loans that survived until the beginning of the study, while we have no information on the loans that had experienced a default prior to the observation period. In general, it is allowed to include left-truncated subjects into the analysis, but it is important to take into account the subjects’ time of exposure to risk when they come under observation (i.e., account for loan’s age at beginning of the study). A subject is said to be right-censored if the date of failure is unobservable either due to subject’s early exit from the study or due to early termination of the study. When applied to loan analysis, for example, we cannot observe the future default date of a loan that is still being paid off at the end of the observation period. For a loan that was paid off during the observation period, on the other hand, the date of the last payment is taken as the censoring date, since a default could have occurred at some point in the future if the loan term was only sufficiently long. The common practice to correct for censoring is to introduce a dummy variable for censored observations.

Empirical Results

The following section presents the regression results and the associated robustness checks.

Logit Regression

The logit regression model is appropriate for modeling a binary outcome without considering the time dimension. Since our data set provides a quarterly time series of mortgage information, we resolve to the following procedure for eliminating the time dimension. For mortgages where there were no defaults in the sample, we take the most recent quarter for the regression analysis. For defaulted mortgages, on the other hand, we identify the quarter in which the mortgage was first reported to be three or more months in arrears. We then use the information reported in this quarter in the logit regression for defaulted mortgages.

Table 5 presents the estimates. Column 1 reports the results without controlling for other characteristics in the model. The EE estimate of − 0.7150 suggests that energy efficiency has a negative and highly significant relationship with default risk. The marginal effect of EE implies that a mortgage on an energy-efficient building is associated with a 34 basis point (bp) reduction in the probability of default relative to its non-efficient counterpart.Footnote 17 This reduction is about one half of the actual default probability, given that the default rate for a non-EE building is 0.66% on average (see Table 3). Since this result could be driven by various mortgage, building, or household characteristics, we include the corresponding control variables. One of the most important criticalities in this analysis stems from the provisional rating table. As mentioned earlier, rating categories of the buildings are constructed by RVO based on building type and construction year period. This means that the results may not be driven by the actual rating, but either by the building type or the age of the building. To disentangle the energy efficiency effect from other building characteristics, both building type and current 3-year age category are used as control variables. Additionally, we control for household (total income tercile and borrower age at origination) and mortgage characteristics (LTV, DSCR, loan term). Further, we include region fixed effects at NUTS 3 level and year fixed effects. Column 2 shows that the negative relation between energy efficiency and the probability of default remains significant and quantitatively sizeable, with an estimated coefficient of − 1.3408. It is noteworthy to mention that LTV might be negatively affected by EE (due to higher building value associated with EE). This means that the negative and significant coefficient captures the EE-effect on top of the value channel. Adding market controls (i.e., unemployment, government bond yields, government bond yield volatility, and yield curve slope) and clustering the standard errors at the NUTS 3 region level does not affect the findings as reported in the last two model specifications. In the most restrictive model specification 4, EE’s marginal effect on PD is about -39 bps.

Table 5 This table presents logit regression estimates to determine the propensity to default on mortgages backed by energy efficient buildings

We validate the above findings with a series of robustness checks. For this purpose, we take specification 4 in Table 5 as the baseline model and replace, redefine, or add covariates as described further below. The additional model specifications are presented in Table 6, where we report for convenience purposes only the regression coefficient associated with the EE dummy variable. Since it is common to estimate a credit risk model with original covariates, we replace the explanatory variable current LTV and with original LTV. As presented under model specification 1, the main results are not driven by the covariate’s reporting date. Models 2 to 5 show that the results are not affected by the definition of the building and borrower age category. In models 2 and 4, we use the actual building and borrower age, respectively. In models 3 and 5, we redefine the age categories from 3- and 5-year categories to 9- and 15-year categories for building and borrower age, respectively. The baseline regression was estimated by omitting contemporaneous DTI and original balance. The reason is that DTI and DSCR measure the same information (i.e., borrower’s ability to repay the debt), while total original balance has a correlation of 0.73 with total income. Consequently, we substitute DSCR with DTI in model 6 and add original balance in model 7. Finally, in model 8, both covariates, DTI and original balance, enter the regression equation, while DSCR is excluded. As presented in Table 6, none of these model specifications affect the main result.

Table 6 This table presents the robustness results for different logit model specifications

Extended Cox Model

The Cox model is typically employed to study survival data over time. Since our data set allows us to track mortgage health periodically, we employ the extended Cox model with time-varying covariates for the period January 2014 to May 2018.

Before presenting the regression results, it is important to confirm whether the proportional hazards assumption holds, as this could affect the interpretation of the results. Figure 2 presents the empirical survivor functions for energy-efficient and non-energy-efficient mortgages. From a visual analysis, we observe that the two curves neither cross nor diverge too much, suggesting that the proportionality assumption holds. The implication of this finding is that the estimated coefficients for the EE variable can be assumed to be constant over time, which means that the estimates do not depend on the reporting time of the last observed value. Additionally, the survivor curves suggest that energy-efficient mortgages survive for longer on average than their non-efficient counterparts.

Fig. 2
figure 2

Survivor functions.This figure shows Kaplan-Meier time-to-default over a 20-year period for two mortgage groups: mortgages with energy efficient (EE = 1) and non-energy efficient (EE = 0) buildings. The Log-rank test for equality of survivor functions gives a p-value of 0.0001

To further explore the observed relationship between EE and survival time, we run the extended Cox regression with time-varying covariates and present the results in Table 7. Model 1 reports the estimated log hazard ratios without controlling for any mortgage and other characteristics. The regression coefficient is negative and highly significant (− 0.5652), confirming the findings from the logit regression. Also from the other three model specifications, we can conclude that the time-varying nature of the covariates does not qualitatively affect the inverse relationship between EE and PD. EE’s marginal effects, however, are more pronounced compared to the logit findings, ranging between -74 (model 1) and -66 bps (model 4). These higher values suggest that EE unfolds its full PD-reducing capacity when the exogenous variables’ time-varying nature is taken into account. Among the time-varying covariates are current LTV, DSCR, loan term, and the macroeconomic variables. It is evident that the former two variables vary over time as the individual loan components are being repaid. The loan term varies less frequently, as it only changes when additional loan components are added to those already in place.

Table 7 This table presents extended Cox model estimates of the probability of mortgage default (log hazard ratios)

To validate these results, we apply similar robustness exercises as in the case of logit regression. The only difference is that model specification 1 is omitted, since the main property of the Cox regression is to include original as well as current covariate values in the regression analysis. Table 8 presents the robustness results. The estimates suggest that neither the redefinition of borrower and building age categories (models 2 through 5), nor the inclusion of additional covariates that might raise multicollinearity concerns (models 7 and 8) affect the main finding. The results are quantitatively and qualitatively similar to the baseline estimate.

Table 8 This table presents the robustness results for different Cox model specifications (log hazard ratios)

In addition, we investigate the extent to which the degree of energy efficiency plays a role in credit risk. The findings suggest that mortgages on more efficient buildings are less prone to default. For brevity, we include the results in Appendix B.

Economic Mechanism

As shown by Zancanella et al. (2018), the benefits of energy-efficient buildings come from savings rather than income. Borrower savings, such as reduced energy bills and home insurance costs (i.e., green home insurance is less expensive than for non-efficient buildings), should result in more disposable income in the event of emergencies or unexpected events.Footnote 18

To measure whether such an effect exists, we decompose the EE variable according to the income tercile group,

$$ \text{EE}_{i} = \sum\limits_{j=1}^{3}\text{IncQ}j_{i}\times\text{EE}_{i} $$

where IncQji is equal to one if the individual i belongs to the tercile group j, and zero otherwise.

Table 9 reports the energy rating distribution according to the income tercile group. The energy-efficient buildings (A/B rating) account for 32.61% of the total sample, 43.45% of the high-income group (INCQ1), 32.66% of the middle-income group (INCQ2), and 21.71% of the low-income group (INCQ3). The EE share decreases by approximately 11 percentage points when moving from a higher to a lower income group. As expected, within the same rating category, the loan defaults increase for lower income classes. For instance, the default rate for the rating class A is equal to 0.22 (INCQ1), 0.24 (INCQ2), and 0.34 (INCQ3).

Table 9 This table presents the energy rating distribution of all and defaulted Dutch loans according to the income tercile group (INCQ1=high, INCQ2=medium and INCQ3=low)

Table 10 presents the regression results for the logit model. All the other explanatory variables remain unchanged with respect to the previous estimates. It is worth noting that IncQ1×EE is not significant for model specifications 1 and 2, while it is negative and significant at the 10% level for models 3 and 4, with an estimated coefficient of − 1.4613. Interestingly, IncQ2×EE and IncQ3×EE show negative and significant estimated coefficients in all proposed four specifications. The magnitude of the coefficient for IncQ3×EE is always higher than for IncQ2×EE and IncQ1×EE in models 3 and 4.Footnote 19 In the last two cases, the marginal effect of EE according to the income group shows a decrease on the probability of default equal to − 39 bps (IncQ1×EE), − 45 bps (IncQ2×EE), and − 46 bps (IncQ3×EE) relative to the non-efficient counterpart. Considering that the average default rate for IncQ3 is 0.93%, the reduction in terms of default probability is economically significant and is half the average default probability for low-income borrowers. This suggests that energy efficiency better mitigates the default risk of borrowers with lower incomes. In this regard, the economic channel is represented by savings that come from reduced costs, which have a greater relative impact on the borrower with less disposable income.

Table 10 This table presents logit regression estimates to determine the propensity to default on mortgages backed by energy efficient buildings according to the income tercile group

For completeness, Table 11 shows the estimates for the extended Cox model. The results are similar to the logit results for model specification 1 where IncQ1×EE is not significant, while IncQ2×EE and IncQ3×EE are negative and significant with almost the same order of magnitude. With respect to the other specifications, we find that energy efficiency mitigates default risk for the borrowers in the second tercile (IncQ2×EE) relative to those belonging to the first (IncQ1×EE). However, we do not find such evidence for IncQ3×EE.

Table 11 This table presents extended Cox model estimates of the probability of mortgage default (log hazard ratios)

Nevertheless, there is still a significant mitigation effect on the probability of default in the Cox model relative to the non-EE counterpart. This implies that the lowest income group IncQ3 still benefits from energy efficiency in terms of lower default risk compared to the non-EE income classes, but the effect is smaller in magnitude compared to the higher income classes IncQ1 and IncQ2 in the EE group. It is worth noting that in the baseline specification 1 in Table 11, the income effect is also present for IncQ3 and the magnitude of the corresponding coefficients decreases when the other time-varying covariates are introduced in the Cox model. One possible explanation for this could be attributed to the borrower’s income, which is only available at the time of loan origination (i.e., it is a time-independent variable that does not change over time as the others). Therefore, the effect on the coefficient IncQ3×EE might be partially absorbed by other time-dependent controls such as the DSCR and the current LTV ratio.

In summary, our findings confirm a mitigation effect on the default probability with respect to the non-EE counterpart. Both models suggest that lower-income households benefit more from energy efficiency than higher-income households, which supports the income channel effect (e.g., through the increase of the disposable income). However, the findings for the lowest-income group survive after the inclusion of control variables only in the logit model but not in the Cox model. As discussed above, this could be attributed to the borrower income, which is observable only at the loan’s origination date.


This study identifies a relationship between building energy efficiency and mortgage default risk. We use a unique data set consisting of Dutch loan-level data supplemented with provisional building energy efficiency ratings obtained from RVO’s rating categories. In the empirical analyses, we exploit the panel structure of the data set, the technological progress, and the non-simultaneous changes in energy efficiency ratings across construction years and building types to disentangle the energy efficiency component from type- and age-specific effects typically associated with borrower default risk.

We make use of two empirical methodologies, the logit regression and the extended Cox model, and find that energy efficiency is negatively related to a borrower’s likelihood of defaulting on mortgage payments. The results hold after accounting for borrower, mortgage, and market control variables. A series of robustness checks confirms that the findings are not driven by any particular assumptions. As a consequence, the discriminatory power of a model using both the usual credit variables and the EE variable significantly exceeds models that only use the traditional credit variables. This suggests that EE ratings complement rather than substitute borrower credit information and that a lender who uses information from both sources (borrower credit information and EE ratings) can make superior lending decisions compared to lenders who do not exhaust all available information.

Furthermore, we investigate whether there is evidence, on top of unobservable characteristics of the borrower, of any economic mechanism that mitigates the default risk of lower-income borrowers. The income channel from the logit regression shows that savings coming from reduced costs, such as energy bills and home insurance costs, have a greater impact in relative terms on borrowers with less disposable income. However, the finding for the lowest income group is not confirmed in the Cox regression. This could be due to the availability of the household’s income that, differently from LTV and DSCR, is observable only at the loan’s origination date. Nevertheless, the economic channel is confirmed in the mitigation of the default risk for the average household.

These aspects are not only crucial for shaping future energy policy, but also have implications for the risk management of European financial institutions. In fact, lower default risk for mortgages on energy-efficient residential buildings could imply different mortgage pricing (i.e., lower interest rates).

The presented findings are a first step in understanding whether and to what extent energy efficiency plays a role in the European mortgage market.