Buildings’ Energy Efficiency and the Probability of Mortgage Default: The Dutch Case

We investigate the relationship between building energy efficiency and the probability of mortgage default. To this end, we construct a novel panel data set by combining Dutch loan-level mortgage information with provisional building energy ratings provided by the Netherlands Enterprise Agency. Using the logit regression and the extended Cox model, we find that building energy efficiency is associated with a lower probability of mortgage default. There are three possible channels that might drive the results: (i) personal borrower characteristics captured by the choice of an energy-efficient building, (ii) improvements in building performance that could help to free-up the borrower’s disposable income, and (iii) improvements in dwelling value that lower the loan-to-value ratio. We address all three channels. In particular, we find that the default rate is lower for borrowers with less disposable income. The results hold for a battery of robustness checks. This suggests that the energy efficiency ratings complement borrowers’ credit information and that a lender using information from both sources can make superior lending decisions than a lender using only traditional credit information. These aspects are not only crucial for shaping future energy policy, but also have implications for the risk management of European financial institutions.


Introduction
Buildings account for 40% of EU energy use (European Parliament and Council 2010) and it is projected that 75-90% of the EU building stock will still be standing in 2050. Therefore, the improvement of buildings' energy efficiency (EE) is among top priority measures that can contribute to meeting EU's commitment to reduce energy consumption and greenhouse gas emissions. 1 From the perspective of mortgage lenders and investors, who have shown growing interest in "green", "sustainable", and "energy-efficient" products in recent years, investing in building performance improvements seems to be an attractive market segment. 2 Support comes from studies around the world documenting that homebuyers and commercial investors recognize the contributory value of increased energy efficiency. Both groups of buyers demand a greater discount for less energy-efficient buildings and price energy certifications into property values. 3 While the positive relationship between EE and sales prices is well documented, it is less obvious whether EE has an impact on borrower credit risk. Understanding the importance of EE to mortgage default is critical. If EE provides significant information to predict mortgage default, then banks or other financial intermediaries (such as fintech companies) should incorporate this information into their credit scoring models. With this paper, we aim to shed light on this aspect by focusing on the residential mortgage market and examining the correlation between EE and the probability of default (PD).
There are three potential channels that might drive the results: (i) personal characteristics of the borrowers captured by the choice of an EE building (e.g., environmental consciousness); (ii) improvements in building performance that help free up a borrower's disposable income through lower utility bills and thus reduce default risk; and (iii) the positive effect on the dwelling value and thus on the loan-to-value ratio (LTV), which lowers default risk. Following the findings of An and Pivo (2020), we address the latter channel by controlling for contemporaneous LTV. Regarding the former two channels, our findings suggest that energy efficiency might play a significant role with regard to default risk either due to unobservable borrower characteristics or the additional cash flow from energy savings.
We seek to contribute to this strand of research by using a loan-level data set that we combine with information on building energy efficiency and document whether there is indeed evidence of a reduction in default rates for borrowers living in energyefficient buildings and whether this reduction is greater for lower-income borrowers.
We use loan-level data from the Dutch mortgage market and investigate the relationship between the energy efficiency of a building and the probability of mortgage default. Focusing on residential buildings (residential detached/semi-detached houses, apartments and terraced houses), our sample consists of mortgages issued on more than 120,000 dwellings. We complement the data set with provisional energy efficiency ratings assigned by the Netherlands Enterprise Agency (Rijksdienst voor Ondernemend Nederland, in short RVO) to all Dutch buildings that have not yet been assigned the actual energy performance certificate (EPC) rating. RVO provides rating categories for 60 pairs of different building type and construction period combinations in the Netherlands. This allows us to match the loan data with EPC ratings according to building type and construction year.
We disentangle the energy efficiency component from building type and age specific effects, which are typically associated with borrower default risk, by exploiting three aspects of the data set. First, because of the panel structure of the data covering the period January 2014 to May 2018, we can construct a time-varying building age category variable that can be used as a control for age while still maintaining variation in the EE variable within specific age categories. Second, the data cover the year 1992, which is the year in which the Dutch Building Decree (Bouwbesluit) came into force requiring better roof and floor isolation for newly built dwellings. This legislative order led to an immediate improvement in EPCs across all building types, so the improvement in EE can be interpreted as a consequence of technological progress rather than a linear function of time. Third, we take advantage of the fact that some EPC ratings change asynchronously over time across the different building types, i.e., for some building types the EPC rating improves earlier than for others. This leaves some degree of variation that cannot be captured exclusively by type or year of construction.
We employ two empirical methodologies -logit regression and the extended Cox model -and find that energy efficiency is negatively related to a borrower's likelihood of defaulting on mortgage payments. The results hold when we account for borrower, mortgage, and market control variables. The findings survive a series of robustness checks. This implies that the energy efficiency rating picks up previously unobserved building and borrower characteristics and/or cash savings that are not captured by the usual variables considered for credit scoring.
Furthermore, we show that buildings' EE better mitigates the default risk of lower-income borrowers. Therefore, we provide evidence for the relevance of the economic channel (on top of the personal characteristics), i.e., that savings coming from reduced costs (i.e., energy bills and insurance costs) might have an impact, at least for borrowers with less disposable income.
In the remainder of this paper, we first provide a more detailed account of the recent developments in the energy rating landscape and present related literature in "Background and Related Literature". In "Data, Energy Efficiency Definition, and Statistics", we explain the construction of the data set and present relevant descriptive statistics. In "Methodology", we outline the methodology. "Empirical Results" discusses the results and "Conclusion" concludes.

Background and Related Literature
This section places this work in context with the historical development of the energy efficiency assessment landscape and previous studies related to energy efficiency as well as the associated findings.

Historical Background on Energy Efficiency Ratings
Over the last three decades, the building sector has experienced rapid growth in the implementation of energy-efficient building technologies. Nevertheless, the current renovation rate for energy efficiency in Europe is about one percent per year, which means that a timely transition to the legally binding target of net zero greenhouse gas emissions by 2050 cannot be guaranteed at this level (Economidou et al. 2019).
In order to make such improvements comparable between different buildings, the energy efficiency components of a building need to be measured, evaluated, and combined into an easily interpretable indicator, i.e., a rating. Currently, the landscape of rating systems is quite diverse. In the United States, for example, various energy efficiency certifications coexist and compete with each another. In Europe, on the other hand, the energy performance certificate is well known but the information inherent in it varies from country to country. In Germany, for instance, two definitions, an energy-consumption and an energy-demand perspective, co-exist under the same EPC label (see Weiss et al. 2012). This provides a challenging research environment for the question at hand: what is the relation between building energy efficiency and mortgage default risk? The answer to this question has the potential to unlock benefits for borrowers, lenders, and investors alike.
In the United States, the history of energy efficiency labels dates back to the early and mid-1980s, when Alaska and California took the first steps towards improving efficiency and affordability of housing in the United States (see Farhar et al. 1997). About a decade later, in 1995, the non-profit organisation Residential Energy Services Network (RESNET) took the initiative to develop the Home Energy Rating System (HERS) 4 , and the governmental Environmental Protection Agency (EPA) introduced the ENERGY STAR certification program 5 for newly constructed singlefamily homes. During the same time, the government-owned National Renewable Energy Laboratory (NREL) initiated a pilot program that would introduce a new financial product called the "energy-efficient mortgage" and link it to a building's energy efficiency. Once the mortgages were distributed, the next step was to evaluate the program. The evaluation phase intended, among other goals, also to analyse to what extent a link between buildings' energy efficiency and the mortgage probability of default exists (see Farhar et al. 1997;Farhar 2000). The results from this analysis would have been the first of their kind. However, the study was either not conducted 4 A HERS index was introduced in 2006. It is normalized to the climatic zone, size, and type of the house. A HERS value of 100 represents the current market standard home construction. Most house scores range from 0 to 150, so the lower the number, the better, i.e., a net-zero-energy house gets a 0. 5 An ENERGY STAR rated home typically achieves a HERS rating of 85 or less.
or not published. The reasons for this remain unknown. In the years since, none of the published energy efficiency reports have been able to provide a thorough analysis either. Problems with data availability were reported as the main reason for this research gap (see, e.g., Hammon 2005).
In Europe, Denmark and the United Kingdom were among the first countries to conduct energy efficiency assessments of buildings in the 1970s and 1980s, respectively. In the early and mid-1990s, various European countries introduced mandatory energy efficiency requirements, which were accompanied by the development and implementation of corresponding rating systems. To name a few, in the UK, BREEAM (Building Research Establishment Environmental Assessment Methodology) and NHER (National Home Energy Rating Scheme) were both introduced in 1990. In Ireland, ERBM (Energy Rating Bench Mark) was created in 1992, while in the Netherlands the energy performance of buildings has been measured since the mid-1990s. In 2002, the EPC was introduced as a requirement for the member states of the European Union by the Energy Performance of Buildings Directive (see European Parliament and Council 2002). As a result, all member states and some other European countries have established national building rating policies during the past two decades. However, despite these initiatives, the use of European energy rating information for research on the financial performance of real estate is rather rare. This paper is among the first attempts to shed light on this issue.

Literature on Energy-Efficient Buildings and Mortgage Default Risk Drivers
In the traditional loan origination business, the default risk of applicants for consumer and mortgage loans is usually assessed through the use of credit scores. These scores are the result of a statistical model that maps an applicant's characteristics to the probability that he or she will default on the loan. The lender uses the prediction of an applicant's likelihood of default to determine the volume of credit granted and the interest rate charged. Typically, the input variables of a statistical model include behavioural, financial, and demographic information. These are usually supplemented by loan-specific characteristics, such as the LTV ratios for mortgage loans. Credit scoring methods are continuously refined, either by introducing new models or by adding new variables or characteristics.
An important question for both practitioners and researchers is whether the inclusion of the mortgage-specific attribute "energy-efficient" or "green" in the lender's scoring model adds value. The theoretical motivation is that mortgages issued on energy-efficient houses should have lower risks relative to less efficient houses. The argument for this reasoning is that borrower's savings from energy use will result in more disposable income in case of emergencies or unexpected events. Burt et al. (2010) argue that house ratings can accurately predict annual energy costs, which should translate into lower default risk. That is, energy efficiency frees up a part of the borrower's income, which improves the ability to repay debt.
However, actual research on this topic is limited. To date, only a few studies have been conducted on this research area, and all of them rely exclusively on residential and commercial mortgage data from the United States. We contribute to this strand of literature by focusing on the Dutch residential mortgage market.
A recent study on the relationship between energy efficiency and the probability of default of residential mortgage loans was conducted by Kaza et al. (2014). The authors use information on about 71,000 loans originated between 2002 and 2010 in the United States for owner-occupied single-family homes.
Their findings show that ENERGY STAR certified dwellings are associated with substantial and significant reductions in default and prepayment risk. The authors argue that the lower risk associated with energy efficiency could be due to energy savings or simply because borrowers with energy-efficient homes are more financially better off than those with less efficient homes. 6 To date, the literature has examined the relationship between energy efficiency and mortgage default risk without analyzing the economic channels. A notable exception is the work of An and Pivo (2020) on commercial buildings. The authors examine the relationship between energy-efficient buildings that carry an ENERGY STAR or LEED label and the corresponding commercial mortgage default risk. The underlying loan sample consists of approximately 6,300 commercial mortgages originated between the years 2000 and 2013 in 17 Metropolitan Statistical Areas across the United States. The results show that traditional default predictors do not fully reflect the financial benefits of energy efficiency, suggesting that loans on ENERGY STAR or LEED certified buildings are 34% less likely to default than their non-certified counterparts. The authors provide evidence of a green premium that reduces the default risk of energy-efficient buildings in two ways: a green rent component that increases the mortgage debt service coverage ratio; and a green value component that increases the owner's equity position and reduces LTV. Both DSCR and LTV are negatively related to default risk. In our paper, we instead consider residential mortgages and the role of household income in addition to mortgage variables.
In addition to using energy efficiency characteristics alone, studies have shown that buildings with higher sustainability ratings are also less susceptible to default risk. By analysing data sets on residential and multifamily homes, Rauterkus et al. (2010) and Pivo (2014) find that sustainability features such as building location, transportation access (e.g., closeness to freeways, subways, work) or housing affordability also play an important role in borrowers' ability to repay their debt.
Other studies focus on personal characteristics that are more oriented on the choice of energy-efficient alternatives without considering them as potential risk drivers in the relationship with mortgages. In fact, in most cases, personal characteristics, such as beliefs and attitudes, are not available and are therefore considered hidden. Black et al. (1985) emphasize that the effects of sociodemographic variables on energy efficiency-oriented behaviour are in most cases indirect and influenced by social norms (awareness of social consequences of energy efficiency), moral norms (concerns about the energy problem), and personal norms (perceived selfinterest such as energy bill savings). More recently, Busic-Sontic et al. (2017) have shown that three out of the five personality traits, such as openness to experience, extraversion, and agreeableness, are significant predictors of EE investment. 7 Finally, Lange and Dewitte (2019) show that cognitive flexibility can explain differences in pro-environmental behaviour.
To summarize, current literature on the direct relationship between energy efficiency and default risk is sparse and focuses on the U.S. housing market. Moreover, only the study of Kaza et al. (2014) employs residential mortgage data to investigate the impact of energy efficiency. The presented results are supportive of a significant and inverse relationship between energy efficiency and mortgage default risk. In this paper, we contribute to this relatively recent strand of literature by focusing on the Dutch residential mortgage market and confirming that EE building standards are related to lower mortgage default rates. We therefore suggest that also for Europe, there is a relationship between EE ratings and default rates that should be incorporated into lenders' risk assessment models. We also provide evidence that the EE rating not only subsumes potential unobservable borrower characteristics, but also captures the effect of lower annual energy costs for lower-income borrowers on their economic ability to repay mortgage debt.

Data, Energy Efficiency Definition, and Statistics
This section details the construction of the data set used in the analysis. The steps include (i) the selection and aggregation procedure of the loan-level data, (ii) the definition of energy-efficient building ratings, and (iii) the methodology of merging the two data sets. In addition, we present the selection of variables for the analysis and the respective summary statistics.

Data and Sample Selection
In the following analysis, we employ Dutch mortgage data obtained from the European DataWarehouse (ED). 8 ED provides a comprehensive data set with periodically updated dynamic and static individual loan-level information on securitized European mortgages. 9 We restrict the data sample according to the following criteria. The sample period covers January 2014 to May 2018 and the country of assets is limited to the Netherlands. The type of borrower is "individual" and the primary income is between EUR 20,000 and 1,000,000. The property type is "residential 7 The Ocean model in trait theory identifies five personality factors: (i) openness to experience (prefers routine/practical vs. imaginative/spontaneous); (ii) conscientiousness (impulsive/disorganized vs. disciplined/careful); (iii) extraversion (reserved/thoughtful vs. sociable/talkative); (iv) agreeableness (suspicious/uncooperative vs. trusting/helpful); and (v) neuroticism (calm/confident vs. anxious/pessimistic) (e.g., see Matthews et al. 2003). 8 The European DataWarehouse is part of the European Central Bank ABS Loan Level Initiative. It provides an open platform for users to access over 1,250 ABS data transactions and private portfolios belonging to different originators across Europe. 9 A comprehensive overview of loan-level data templates including detailed variable descriptions on residential mortgage-backed securities (RMBS) data sets is available on ECB's website: https://www.ecb. europa.eu/paym/coll/loanlevel/transmission/html/index.en.html. detached/semi-detached house", "apartment", or "terraced house". The building's occupancy type is restricted to "owner-occupied" and the construction year of the buildings is between 1900 and 2016. In addition, we focus only on fixed-interest rate mortgages and exclude repurchased ones. Finally, we require that each borrower be associated with exactly one building and vice versa. Appendix A provides an overview of the variables selected for the analysis.
After applying the above selection criteria, our final data set totals 273,024 individual mortgage components that are associated with 127,309 individual buildings. The discrepancy between the number of mortgage components and the number of underlying buildings results from the Dutch-specific tax treatment of mortgages. A typical Dutch mortgage loan consists of multiple loan parts, e.g., a bank savings loan part that is combined with an interest-only loan part. This is more common for mortgages originated prior to 2013 when there was a specific tax preference for interest-only mortgages. Besides the tax reasons, the number of mortgage components may go beyond two if a borrower takes out an additional mortgage on the same building at a later date.
For the analysis, we aggregate loan component information at the building level. For certain variables, this is already done by the data provider. For example, variables such as LTV or debt-to-income (DTI) are available at the borrower level (i.e., the same value is reported for each loan component). Where necessary, we compute for each building the average variable value across loan components weighted by the loan component's original balance. "Choice of Variables and Summary Statistics" provides further details on this aggregation procedure.

Defining Energy Efficiency
For the classification of buildings into different energy efficiency categories, we rely on the Dutch energy performance reference table prepared by the RVO. 10 The intention behind this table is to determine a provisional energy label for all existing Dutch residential buildings. The provisional EPC indicates the energy performance of a reference building that was developed using cadastral data (i.e., area, date of construction, building type, quality of insulation of floors, roof and walls, and systems for heating, hot water, and renewable energy) of the Dutch residential building stock. The dwelling owners are encouraged to modify or add additional information about energy improvement measures, which must be approved by a qualified expert before being posted on the website. In this respect, owners must also provide evidence of the measures carried out, such as invoices and photographs. The qualified expert reviews the uploaded changes and documents before approving the definite EPC. Based on this approval, the provisional EPC is finally replaced by the actual, new EPC, which is registered at RVO. This final EPC is based on a national calculation method that takes into account the retrofit measures carried out by the owner of the dwelling.
In the provisional rating table, the label classes are calculated as described in RVO (2014). This document relies on studies conducted on the Dutch residential market, including Boumeester et al. (2008), Agentshap NL (2011), and Agentshap NL (2013). For the provisional label, the RVO has drawn up 60 reference buildings that serve for determining the provisional energy label. 11 The ordinal rating scale ranges from G (lowest energy efficiency) to A (highest energy efficiency). For each type of dwelling and year of construction, the most common characteristics of the house were studied in terms of flooring, roof, heating and ventilation system, presence of solar panels, etc., and each characteristic was assigned a code. Then, based on these properties, the approximate energy consumption of the reference dwelling was calculated and a corresponding energy label between G and A was assigned.
It should be noted that the provisional label assignment procedure is not as optimal as the assignment of actual EPC ratings. Unfortunately, the actual EPC ratings are not available. Therefore, a measurement error may be present. The measurement error can occur in two ways: randomly or systematically. If the error is random, then it should be averaged out due to the size of the data set. 12 If the error is systematic in nature, then it should be attributed to one of two possible sources: either (i) the energy efficiency of buildings systematically improved after construction (e.g., due to homeowner participation in large-scale retrofit programmes) or (ii) it deteriorated. We can rule out systematic rating deterioration because it is irrational for a large base of homeowners to deliberately reduce their buildings' energy efficiency or to neglect building maintenance. On the other hand, a systematic improvement in energy efficiency is a legitimate concern. 13 However, if actual energy ratings improve after a building was constructed, then our results should be placed at the lower end of the estimation spectrum, meaning that our findings are likely to underestimate the true relationship between EE and PD. Similar to other studies that have only EPC ratings available, there is a presence of uncertainty regarding actual energy consumption. Namely, buildings with the same energy rating do not necessarily have the same gas or energy consumption, which depends on the individual consumption patterns of borrowers. Consequently, consumption behaviour usually is treated as idiosyncratic. Table 1 provides an overview of the final energy classification for the analysis. It is evident that energy efficiency has improved over time, with the most efficient buildings built after 2006. Furthermore, it should be noted that the construction year 11 RVO's table differentiates between six building types (detached house, semi-detached house, row home corner, terraced house between, flat/apartment, maisonette) and ten construction year periods (1945 and earlier, 1946-1964, 1965-1974, 1975-1982, 1983-1987, 1988-1991, 1992-1999, 2000-2005, 2006-2013, and 2014 and later). The basis for the development of the reference buildings are the Dutch national housing surveys WoonOnderzoek Nederland (WoON 2006) 2005 and WoON 2012. 12 A random error may be present because the reference buildings are a schematic representation of the Dutch dwelling stock, which means that actual buildings may deviate in their shapes and energy installation according to homebuilders' idiosyncratic preferences. However, some features may not differ too much from the reference building. For instance, 84% of the Dutch housing stock has central heating and the two most common facade types cover 94% of the facades in the Netherlands (78% are "brickwork cavity" and 15% are "masonry solid") (cf. WoON 2006). 13 Homeowners can gradually improve the energy performance (i) as part of building maintenance activities (e.g., by installing better insulation while repairing the roof), (ii) due to deliberate retrofitting measures, (iii) or due to large-scale government-sponsored retrofitting programmes. Table 1 This table presents the energy rating distribution across property types and construction years   Construction year   Property type 1900-1946-1965-1975-1983-1988-1992-2000-2006 1945 1964 1974 1982 1987 1991 1999 2005 or later The rating scale ranges from A to G, with A being the highest rating. RVO's rating categories are obtained from http://energielabelatlas.nl/info/index.html and adjusted according to the property type definition in the mortgage dataset. Property type "residential detached/semi-detached house" in ED's dataset corresponds to property types "vrijstaande woning" and "twee/één kapwoning" in RVO's table, ED's "apartment" corresponds to RVO's "flat/apartment", and ED's "terraced house" corresponds to RVO's "rijwoning tussen" periods are not of equal length. This means that energy efficiency improvement is not a linear function of the year of construction. It is technological progress and legislation (and not simply time) that are the driving factors determining rapid energy efficiency improvement during certain periods. In particular, the Building Decree (Bouwbesluit), which came into force in 1992, stands out as an important piece of legislation requiring better roof and floor isolation for newly built dwellings. Its impact can be observed in the rating improvements from C to B between the 1988-1991 and 1992-1999 periods. As explained in the next section, we take advantage of the panel structure of the data and define a variable for building age that allows us to capture the EE effect arising due to the rapid improvement in energy efficiency (i.e., technological progress) between the years 1991 and 1992. In addition, we can observe that some ratings do not change simultaneously across property types and construction years. This feature allows us to decouple the energy efficiency component from the construction year and type effect when examining the degree of energy efficiency in the Appendix B. Table 2 presents the energy rating distribution of all buildings in the sample, and Table 3 reports the building distribution across Dutch provinces. In both tables, a mortgage on a building is marked as defaulted if at least one of the mortgage components is in arrears for at least for 90 days. In can be observed that C-rated (E-rated) buildings represent the higher (lower) bucket in the sample, while the remaining ratings are distributed reasonably evenly. Column three in Table 2 reports the percentage of defaulted mortgages within each rating category. In this context, the rising share of defaults associated with a lower energy efficiency rating is worth highlighting. Overall, the share of defaulted mortgages is rather low at 0.55%. From Table 3, we can see that the mortgages are not evenly distributed across the Dutch provinces, with the largest share stemming from Holland. Within each province, between one-half and one-fifth of buildings are classified as energy-efficient (i.e., having an A or B rating). Among defaulted loans, the share of energy-efficient mortgages is always lower compared to non-efficient mortgages in each province.

Choice of Variables and Summary Statistics
The control variables for the analyses are those identified in the existing literature as having a significant impact on mortgage default probability (see An and Pivo 2015).
In particular, the variables are chosen in order to account for potential risk mitigating channels that might arise due to energy efficiency. The variables can be categorized Column 4 provides the percentage share of each province within the total sample of loans. Column 5 states the share of energy efficient buildings (defined as A or B-rated buildings) within each province. Columns 6 and 7 depict the percentage share of defaulted non-energy efficient and energy efficient mortgages with a province. The total number of unique buildings is 126,036 into four different types: mortgage, building, borrower, and macroeconomic/financial variables. Among mortgage variables, we employ contemporaneous LTV, contemporaneous DSCR, contemporaneous DTI, and loan term. LTV and DTI are reported by ED. 14 The DSCR for each building is defined as the ratio of total monthly income to total monthly periodic constant payments. The latter being the sum of monthly periodic constant payments across all loan components associated with the building. The periodic constant payments were calculated using contemporaneous loan balance, the interest rate, and the number of periods remaining until maturity. The loan term at the building level is defined as the difference between issue date and the maturity date (measured in months) and aggregated as the original balance-weighted average across all loan components associated with that building. Using contemporaneous LTV allows us to control for the potential value channel that might arise due to energy efficiency. As pointed out by An and Pivo (2020), energy efficiency is likely to improve the dwelling's market value, which in turn should lower the contemporaneous LTV. Chegut et al. (2020) document that Dutch energy efficient dwellings were valued significantly higher in 2015 compared to the baseline year of 2010. This suggests that the reported contemporaneous LTV ratios in our panel data set, which covers the period January 2014 to May 2018, should reflect the market value of buildings' energy efficiency. Thus, LTV appears to be an appropriate control for the value channel and our findings should therefore be placed on top of the valueinduced risk reduction effect. Regarding the income channel, i.e. the improvement in the borrower's disposable income due to energy efficiency, neither DSCR nor DTI are appropriate control variables because they only capture general household income, but not savings from energy efficiency. Unfortunately, we do not have access to information on borrower energy use that could address this issue. However, in "Economic Mechanism" we examine the income channel by differentiating between high-, middle, and low-income households on top of the classical DSCR or DTI ratio used in the default probability models to capture the additional cash flow due to energy savings.
Building variables include property type, geographic location at the NUTS 3 level, and building age category. 15 Building age is defined as the difference between the current loan year and the year the building was built. We categorize building age into 3-year categories because, according to Underwood and Alshawi (2000), this is the shortest maintenance cycle of a building. Due to the panel structure of our data, this variable definition allows us to disentangle the building age component from the EE 14 In cases where DTI is missing, we approximate the variable as the ratio between total original balance per building and total household income. This procedure seems reasonable because the average absolute difference between the reported contemporaneous DTI and the approximated DTI is small, only 0.23 relative to its mean of 3.65 (cf. Table 4, Panel A). 15 The Nomenclature of Territorial Units for Statistics (NUTS) is a geocode standard for referencing the subdivisions of countries for statistical purposes. For each EU member country, a hierarchy of three NUTS levels is established by Eurostat in agreement with each member state. Among the three levels, the NUTS 3 codes refer to the most granular region specification and is the usual reference for specific diagnoses and policy analyses. component in the regressions. This is due to the fact that EE variation remains within certain age categories. 16 Borrower-level information includes total income, defined as the sum of primary and secondary income, and the borrower's age at origination of the earliest loan component. We categorize the total income across high, medium and low tercile groups. These variables are designed to account for household characteristics that may influence both the decision to purchase an energy efficient home and mortgage default risk. For instance, older, wealthier, and more financially literate households might be more likely to acquire a home with better energy performance while having a low-risk credit profile.
To control for general macroeconomic conditions, we include the quarterly Dutch unemployment rate, the end-of-month 10-year German government bond yields, the monthly standard deviation of 10-year German bond yields, and the yield curve slope, defined as the difference between 10-and 1-year EUR swap rates. The variables are obtained from Bloomberg. Summary statistics at the property level are presented below. Table 4 provides summary statistics on key borrower, property, and mortgage characteristics as a onetime cross-sectional snapshot using the most recently reported values. The table differentiates between non-defaulted (Panel A) and defaulted (Panel B) mortgages. Within both panels, we also differentiate between energy-efficient (EE = 1) and energy-inefficient (EE = 0) buildings. A dwelling is considered EE if it has an A or B rating. Beginning with borrower characteristics, age at mortgage origination does not appear to differ substantially between EE and non-EE mortgages. However, younger borrowers are significantly more likely to experience default than older borrowers. In terms of income, EE-building borrowers have higher overall total household income for both defaulted and non-defaulted loans, while defaulted borrowers generally have relatively lower annual income. The construction year of buildings varies between EE and non-EE by definition. More recently constructed buildings are EE. About 68% of buildings are detached houses, while 17% are apartments and 15% are terraced houses in the sample (results are not reported in the table). Average interest rates and original LTV are higher for defaulted and non-EE mortgages. In line with the findings of An and Pivo (2020), we observe that on average LTV is lower for energy efficient dwellings, suggesting that EE is associated with higher property value. Figure 1 shows the distribution of mortgages according to the buildings' year of construction (Panel A), the total original balance (Panel B), and the earliest origination year (Panel C). It is worth noting that our data set is well diversified according 16 To illustrate this, consider two buildings: a C-rated building constructed in 1991 and a B-rated building constructed in 1992. Also assume that the mortgages on the two buildings default in different years: the Crated building defaults in 2015, while the B-rated building defaults in 2016. Consequently, both buildings are 24 years old at the time of their respective default events, leading to a variation of the EE variable within the sample of 24-year-old buildings. In addition, both buildings fall into the same 3-year age category (i.e., age 24 to 26), again leading to variation in the EE variable within that age category. Consequently, the control variable building age does not fully absorb the EE component, allowing the EE variable to be used as a separate predictor of default.  1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000  to buildings' construction year. In addition, we have a considerable number of mortgages older than ten years. This is an important feature since defaults typically do not occur in the first few years after origination. Unreported statistics on market and economic variables indicate that the average quarterly Dutch unemployment rate for the period January 2014 to May 2018 is about 6.47%. For the same period, the average yield on 10-year German government bonds is 0.46%, their average monthly standard deviation is 0.095%, and the average difference between 10-and 1-year Euro swap rates amounts to 0.963%.

Methodology
The industry standard for estimating mortgage default risk in loans is to use a logistic regression model. Among the more sophisticated techniques, survival analysisin particular the use of the Cox model -is a popular alternative approach because it allows to account for the time-series nature of a given data set. Given the panel structure of the data provided by ED, we present and employ both estimation techniques in our analysis.

Logistic Regression
An approach frequently used in the literature to study the relationship between borrower-level loan information and the probability of mortgage default is logistic regression (see, e.g., Altman and Saunders 1997;Episcopos et al. 1998;Qi et al. 2020). Its main difference from the linear regression model is that the dependent variable is a latent variable and that only the binary outcome variable Y , i.e., the default event, can be observed. At a random point in time, Y takes either on a one if the event occurs and a zero otherwise. The probability distribution of Y is modeled as (1) where Y i is equal to one if at least one of the mortgage components on building i has experienced a repayment delay for at least three months in a row, and zero otherwise. The results of logistic and logit regressions are equivalent since both are obtained through the maximum likelihood estimator. The relationship between the two is that the logistic function is also know as the inverse logit function. For ease of reading, we use the logit model, which does not report estimates as odds ratios. The vector of explanatory variables x i contains the energy efficiency indicator EE i , which is equal to one if a building has a provisional energy rating A or B, and zero otherwise. All other independent variables fall into one of the four categories: mortgage, building, household, or market control variables. A detailed overview of the covariates is presented in "Choice of Variables and Summary Statistics".

Extended Cox Model
The Cox regression is a survival analysis method that aims to estimate the distribution function f (t) of random event times T , where the event is the default date of a loan component. Following the standard terminology, a generic survival model is represented by a survival function S(t) and a hazard function h(t), which are defined as (2) (3) where S(t) is a monotone decreasing function in t with the limits lim t→0 S(t) = 1 and lim t→∞ S(t) = 0. The survival function models the probability that a loan will survive beyond a threshold period t. The hazard function h(t) represents the instantaneous risk or conditional probability that a default will occur at time t given that the loan has survived up to time t. A convenient way to express the relationship between the survival function S(t) and the hazard function h(t) is to introduce the cumulative hazard function H (t) = t 0 h(u)du. Using the relationship given in Eq. (3), it is then straightforward to show that H (t) = − ln{S(t)}. The cumulative hazard function can be interpreted as the total amount of risk accumulated up to time t. Cox (1972) proposes a proportional hazards model with the hazard function being defined as the product of a positive baseline hazard rate h 0 (t) and the exponential of a linear function of explanatory variables x i : where x i is a vector of time-fixed covariates that are associated with building i and β is a vector of the corresponding regression coefficients. This basic form of the Cox model can be extended to allow for time-varying covariates: where the vector consists of p timeindependent and q time-dependent predictor variables. The graphical visualization of empirical survival functions can shed light on whether the proportional hazards assumption holds. The empirical survival function is typically depicted using the Kaplan and Meier (1958) method. It is an accepted non-parametric approach that assumes that censoring time is independent of an individual's behaviour. The empirical function is defined as (6) where t m are ordered event times and the probabilities being approximated by the frequency distribution in the data set.
In survival analysis, the observation period is typically limited to a defined period of time, so it is important to differentiate between the time (i) when a subject is first at risk, (ii) when a subject is under observation, and (iii) when a subject experiences failure. An individual is said to be left-truncated if the date on which it first becomes at risk precedes the start of the observation period. In the case of loans, left-truncation applies to loans that were originated before the first observation date. And among the left-truncated loans, we can observe only the loans that survived until the beginning of the study, while we have no information on the loans that had experienced a default prior to the observation period. In general, it is allowed to include left-truncated subjects into the analysis, but it is important to take into account the subjects' time of exposure to risk when they come under observation (i.e., account for loan's age at beginning of the study). A subject is said to be right-censored if the date of failure is unobservable either due to subject's early exit from the study or due to early termination of the study. When applied to loan analysis, for example, we cannot observe the future default date of a loan that is still being paid off at the end of the observation period. For a loan that was paid off during the observation period, on the other hand, the date of the last payment is taken as the censoring date, since a default could have occurred at some point in the future if the loan term was only sufficiently long. The common practice to correct for censoring is to introduce a dummy variable for censored observations.

Empirical Results
The following section presents the regression results and the associated robustness checks.

Logit Regression
The logit regression model is appropriate for modeling a binary outcome without considering the time dimension. Since our data set provides a quarterly time series of mortgage information, we resolve to the following procedure for eliminating the time dimension. For mortgages where there were no defaults in the sample, we take the most recent quarter for the regression analysis. For defaulted mortgages, on the other hand, we identify the quarter in which the mortgage was first reported to be three or more months in arrears. We then use the information reported in this quarter in the logit regression for defaulted mortgages. Table 5 presents the estimates. Column 1 reports the results without controlling for other characteristics in the model. The EE estimate of −0.7150 suggests that energy efficiency has a negative and highly significant relationship with default risk. The marginal effect of EE implies that a mortgage on an energy-efficient building is associated with a 34 basis point (bp) reduction in the probability of default relative to its non-efficient counterpart. 17 This reduction is about one half of the actual default probability, given that the default rate for a non-EE building is 0.66% on average (see Table 3). Since this result could be driven by various mortgage, building, or household characteristics, we include the corresponding control variables. One of the most important criticalities in this analysis stems from the provisional rating table. As mentioned earlier, rating categories of the buildings are constructed by RVO based on building type and construction year period. This means that the results may not be driven by the actual rating, but either by the building type or the age of the building. To disentangle the energy efficiency effect from other building characteristics, both building type and current 3-year age category are used as control variables. Additionally, we control for household (total income tercile and borrower age at origination) and mortgage characteristics (LTV, DSCR, loan term). Further, we include region fixed effects at NUTS 3 level and year fixed effects. Column 2 shows that the negative relation between energy efficiency and the probability of default remains significant and quantitatively sizeable, with an estimated coefficient of −1.3408. It is noteworthy to mention that LTV might be negatively affected by EE (due to higher building value associated with EE). This means that the negative and significant coefficient captures the EE-effect on top of the value channel. Adding market controls (i.e., unemployment, government bond yields, government bond yield volatility, and yield curve slope) and clustering the standard errors at the NUTS 3 region level does not affect the findings as reported in the last two model specifications. In the most restrictive model specification 4, EE's marginal effect on PD is about -39 bps. 17 The marginal effect measures the discrete change of the predicted default probabilities when the binary EE variable changes from 0 to 1. The dependent variable is a dummy that equals one if the mortgage is in default (i.e., in arrears for at least three months) and zero otherwise. The explanatory variables are the dummy variable EE, which equals one if a building's energy efficiency rating is A or B and zero otherwise, current LTV ratio, current DSCR, and mortgage term in months. Dwelling controls are property type and contemporaneous 3-year building age category variable. Household controls include 5-year borrower age category variable at mortgage origination and total household income tercile mid and low groups, IncQ2 and IncQ3. Market controls are the Dutch unemployment rate, 10-year German government bond yields, the standard deviation of the bond yields, and the yield curve slope of EUR swap rates. Additional control variables are current loan year fixed effects and NUTS 3 region fixed effects. Standard errors are either robust or clustered at regional level and reported in squared brackets. Statistical significance is denoted by ***, **, and * at the 1%, 5%, and 10% level, respectively We validate the above findings with a series of robustness checks. For this purpose, we take specification 4 in Table 5 as the baseline model and replace, redefine, or add covariates as described further below. The additional model specifications are presented in Table 6, where we report for convenience purposes only the regression coefficient associated with the EE dummy variable. Since it is common to estimate a credit risk model with original covariates, we replace the explanatory variable current LTV and with original LTV. As presented under model specification 1, the main results are not driven by the covariate's reporting date. Models 2 to 5 show that the The dependent variable is a dummy that equals one if the mortgage is in default (i.e., in arrears for at least three months) and zero otherwise. The baseline model specification accounts for mortgage, dwelling, household, and market controls, including year and region fixed effects, and standard errors clustered at regional level (see Table 5, model 4). The model specifications 1 to 8 in this table differ from the baseline model according to the following changes. Spec. 1: the explanatory variable current LTV is replaced by original LTV. Spec. 2: contemporaneous 3-year-building age category is replaced by actual contemporaneous building age. Spec. 3: contemporaneous 3-building age category is replaced by contemporaneous 9-year-building age category. Spec. 4: 5-year-borrower age category is replaced by actual borrower age at origination of earliest loan component. Spec. 5: 5-year-borrower age category is replaced by 15-yearborrower age category. Spec. 6: current DSCR is replaced by current DTI. Spec. 7: original balance is added to the baseline model. Spec. 8: current DSCR is replaced by current DTI and original balance is added to the baseline model. Column 2 (3) reports the estimated regression coefficient (standard errors) for the EE (A/B) dummy variable. Statistical significance is denoted by ***, **, and * at the 1%, 5%, and 10% level, respectively results are not affected by the definition of the building and borrower age category. In models 2 and 4, we use the actual building and borrower age, respectively. In models 3 and 5, we redefine the age categories from 3-and 5-year categories to 9-and 15year categories for building and borrower age, respectively. The baseline regression was estimated by omitting contemporaneous DTI and original balance. The reason is that DTI and DSCR measure the same information (i.e., borrower's ability to repay the debt), while total original balance has a correlation of 0.73 with total income. Consequently, we substitute DSCR with DTI in model 6 and add original balance in model 7. Finally, in model 8, both covariates, DTI and original balance, enter the regression equation, while DSCR is excluded. As presented in Table 6, none of these model specifications affect the main result.

Extended Cox Model
The Cox model is typically employed to study survival data over time. Since our data set allows us to track mortgage health periodically, we employ the extended Cox model with time-varying covariates for the period January 2014 to May 2018. Before presenting the regression results, it is important to confirm whether the proportional hazards assumption holds, as this could affect the interpretation of  Figure 2 presents the empirical survivor functions for energy-efficient and non-energy-efficient mortgages. From a visual analysis, we observe that the two curves neither cross nor diverge too much, suggesting that the proportionality assumption holds. The implication of this finding is that the estimated coefficients for the EE variable can be assumed to be constant over time, which means that the estimates do not depend on the reporting time of the last observed value. Additionally, the survivor curves suggest that energy-efficient mortgages survive for longer on average than their non-efficient counterparts.
To further explore the observed relationship between EE and survival time, we run the extended Cox regression with time-varying covariates and present the results in Table 7. Model 1 reports the estimated log hazard ratios without controlling for any mortgage and other characteristics. The regression coefficient is negative and highly significant (−0.5652), confirming the findings from the logit regression. Also from the other three model specifications, we can conclude that the time-varying nature of the covariates does not qualitatively affect the inverse relationship between EE and PD. EE's marginal effects, however, are more pronounced compared to the logit findings, ranging between -74 (model 1) and -66 bps (model 4). These higher values suggest that EE unfolds its full PD-reducing capacity when the exogenous variables' time-varying nature is taken into account. Among the time-varying covariates are current LTV, DSCR, loan term, and the macroeconomic variables. It is evident that the former two variables vary over time as the individual loan components are being repaid. The loan term varies less frequently, as it only changes when additional loan components are added to those already in place.
To validate these results, we apply similar robustness exercises as in the case of logit regression. The only difference is that model specification 1 is omitted, since the main property of the Cox regression is to include original as well as current covariate values in the regression analysis. Table 8 presents the robustness results. The dependent variable is a dummy that equals one if the mortgage is in default (i.e., in arrears for at least three months) and zero otherwise. The explanatory variables are the dummy variable EE, which equals one if a building's energy efficiency rating is A or B and zero otherwise, current LTV ratio, current DSCR, and mortgage term in months. Dwelling controls are property type and contemporaneous 3-year building age category variable. Household controls include 5-year borrower age category variable at mortgage origination and total household income tercile mid and low groups, IncQ2 and IncQ3. Market controls are the Dutch unemployment rate, 10-year German government bond yields, the standard deviation of the bond yields, and the yield curve slope of EUR swap rates. Additional control variables are current loan year fixed effects and NUTS 3 region fixed effects. Standard errors are either robust or clustered at regional level and reported in squared brackets. Statistical significance is denoted by ***, **, and * at the 1%, 5%, and 10% level, respectively The estimates suggest that neither the redefinition of borrower and building age categories (models 2 through 5), nor the inclusion of additional covariates that might raise multicollinearity concerns (models 7 and 8) affect the main finding. The results are quantitatively and qualitatively similar to the baseline estimate.
In addition, we investigate the extent to which the degree of energy efficiency plays a role in credit risk. The findings suggest that mortgages on more efficient buildings are less prone to default. For brevity, we include the results in Appendix B. The dependent variable is a dummy that equals one if the mortgage is in default (i.e., in arrears for at least three months) and zero otherwise. The baseline model specification accounts for mortgage, dwelling, household, and market controls, including year and region fixed effects, and standard errors clustered at regional level (see Table 7, model 4).

Economic Mechanism
As shown by Zancanella et al. (2018), the benefits of energy-efficient buildings come from savings rather than income. Borrower savings, such as reduced energy bills and home insurance costs (i.e., green home insurance is less expensive than for non-efficient buildings), should result in more disposable income in the event of emergencies or unexpected events. 18 To measure whether such an effect exists, we decompose the EE variable according to the income tercile group, 18 In fact, it is estimated that more than 50 million Europeans cannot benefit from such savings because they do not have access to energy-efficient housing and are thus affected by energy poverty (European Commission 2019). Energy poverty refers to inadequate access to warmth, cooling, lighting, and the energy to power appliances, which affects people's health and well-being. Zhao et al. (2018) document that improved energy performance can result in savings of up to 9.3% of annual income for extremely low-income households. James and Ambrose (2017) find that the most efficient reduction in energy consumption among low-income households can be achieved through a combination of EE retrofits and behavioural modification measures. Column 2-4-6-8 provide the percentage share of each rating category within the total sample of loans. Column 3-6-9-12 state the share of defaulted loans within each rating category where IncQj i is equal to one if the individual i belongs to the tercile group j , and zero otherwise. Table 9 reports the energy rating distribution according to the income tercile group. The energy-efficient buildings (A/B rating) account for 32.61% of the total sample, 43.45% of the high-income group (INCQ1), 32.66% of the middle-income group (INCQ2), and 21.71% of the low-income group (INCQ3). The EE share decreases by approximately 11 percentage points when moving from a higher to a lower income group. As expected, within the same rating category, the loan defaults increase for lower income classes. For instance, the default rate for the rating class A is equal to 0.22 (INCQ1), 0.24 (INCQ2), and 0.34 (INCQ3). Table 10 presents the regression results for the logit model. All the other explanatory variables remain unchanged with respect to the previous estimates. It is worth noting that IncQ1×EE is not significant for model specifications 1 and 2, while it is negative and significant at the 10% level for models 3 and 4, with an estimated coefficient of −1.4613. Interestingly, IncQ2×EE and IncQ3×EE show negative and significant estimated coefficients in all proposed four specifications. The magnitude of the coefficient for IncQ3×EE is always higher than for IncQ2×EE and IncQ1×EE in models 3 and 4. 19 In the last two cases, the marginal effect of EE according to the income group shows a decrease on the probability of default equal to −39 bps (IncQ1×EE), −45 bps (IncQ2×EE), and −46 bps (IncQ3×EE) relative to the nonefficient counterpart. Considering that the average default rate for IncQ3 is 0.93%, the reduction in terms of default probability is economically significant and is half the average default probability for low-income borrowers. This suggests that energy efficiency better mitigates the default risk of borrowers with lower incomes. In this The dependent variable is a dummy that equals one if the mortgage is in default (i.e., in arrears for at least three months) and zero otherwise. The explanatory variables are the dummy variable IncQi×EE that equals to one if a building's energy efficiency rating is A or B and belongs to the tercile group i and zero otherwise. Mortgage controls are current LTV ratio, current DSCR, and mortgage term in months. Dwelling controls are property type and contemporaneous 3-year building age category variable. Household controls include 5-year borrower age category variable at mortgage origination and total household income tercile mid and low groups, IncQ2 and IncQ3. Market controls are the Dutch unemployment rate, 10-year German government bond yields, the standard deviation of the bond yields, and the yield curve slope of EUR swap rates. Additional control variables are current loan year fixed effects and NUTS 3 region fixed effects. Standard errors are either robust or clustered at regional level and reported in squared brackets. Statistical significance is denoted by ***, **, and * at the 1%, 5%, and 10% level, respectively regard, the economic channel is represented by savings that come from reduced costs, which have a greater relative impact on the borrower with less disposable income.
For completeness, Table 11 shows the estimates for the extended Cox model. The results are similar to the logit results for model specification 1 where IncQ1×EE is (IncQ2×EE) relative to those belonging to the first (IncQ1×EE). However, we do not find such evidence for IncQ3×EE. Nevertheless, there is still a significant mitigation effect on the probability of default in the Cox model relative to the non-EE counterpart. This implies that the lowest income group IncQ3 still benefits from energy efficiency in terms of lower default risk compared to the non-EE income classes, but the effect is smaller in magnitude compared to the higher income classes IncQ1 and IncQ2 in the EE group. It is worth noting that in the baseline specification 1 in Table 11, the income effect is also present for IncQ3 and the magnitude of the corresponding coefficients decreases when the other time-varying covariates are introduced in the Cox model. One possible explanation for this could be attributed to the borrower's income, which is only available at the time of loan origination (i.e., it is a time-independent variable that does not change over time as the others). Therefore, the effect on the coefficient IncQ3×EE might be partially absorbed by other time-dependent controls such as the DSCR and the current LTV ratio.
In summary, our findings confirm a mitigation effect on the default probability with respect to the non-EE counterpart. Both models suggest that lower-income households benefit more from energy efficiency than higher-income households, which supports the income channel effect (e.g., through the increase of the disposable income). However, the findings for the lowest-income group survive after the inclusion of control variables only in the logit model but not in the Cox model. As discussed above, this could be attributed to the borrower income, which is observable only at the loan's origination date.

Conclusion
This study identifies a relationship between building energy efficiency and mortgage default risk. We use a unique data set consisting of Dutch loan-level data supplemented with provisional building energy efficiency ratings obtained from RVO's rating categories. In the empirical analyses, we exploit the panel structure of the data set, the technological progress, and the non-simultaneous changes in energy efficiency ratings across construction years and building types to disentangle the energy efficiency component from type-and age-specific effects typically associated with borrower default risk.
We make use of two empirical methodologies, the logit regression and the extended Cox model, and find that energy efficiency is negatively related to a borrower's likelihood of defaulting on mortgage payments. The results hold after accounting for borrower, mortgage, and market control variables. A series of robustness checks confirms that the findings are not driven by any particular assumptions. As a consequence, the discriminatory power of a model using both the usual credit variables and the EE variable significantly exceeds models that only use the traditional credit variables. This suggests that EE ratings complement rather than substitute borrower credit information and that a lender who uses information from both sources (borrower credit information and EE ratings) can make superior lending decisions compared to lenders who do not exhaust all available information. Furthermore, we investigate whether there is evidence, on top of unobservable characteristics of the borrower, of any economic mechanism that mitigates the default risk of lower-income borrowers. The income channel from the logit regression shows that savings coming from reduced costs, such as energy bills and home insurance costs, have a greater impact in relative terms on borrowers with less disposable income. However, the finding for the lowest income group is not confirmed in the Cox regression. This could be due to the availability of the household's income that, differently from LTV and DSCR, is observable only at the loan's origination date. Nevertheless, the economic channel is confirmed in the mitigation of the default risk for the average household.
These aspects are not only crucial for shaping future energy policy, but also have implications for the risk management of European financial institutions. In fact, lower default risk for mortgages on energy-efficient residential buildings could imply different mortgage pricing (i.e., lower interest rates).
The presented findings are a first step in understanding whether and to what extent energy efficiency plays a role in the European mortgage market.
further and consider the actual degree of energy efficiency. Following the findings of Kaza et al. (2014), we hypothesize that the more efficient buildings are associated with a relatively lower risk of default.
For the analysis, we construct a new categorical variable that aggregates the energy efficiency ratings according to four efficiency classes. Efficiency class 1 assumes energy ratings A and B, class 2 is assigned to ratings C and D, class 3 is assigned to rating E and F, and class 4 is reserved for G-rated buildings. All other explanatory variables remain unchanged. Table 12 presents the regression results for both regression methodologies. We can observe that the findings are less pronounced compared to the main analysis. Overall, the estimates for rating classes 1 to 3 exhibit an increasing pattern with the degree of energy inefficiency: the higher the rating, the lower the associated risk of default. However, the explanatory power of these results diminishes with the inclusion of additional control variables. This could be attributed to the inherent imprecision of the ratings in the constructed data set. In the main analysis, we can assume that the general classification of buildings into the two categories "energy efficient" and "energy inefficient" is more or less accurate. Any misspecification is likely to occur only at the B-and C-rating thresholds and is negligible due to the law of large numbers, as long as the number of observations is large enough.
In the analysis on the degree of efficiency, however, two additional rating thresholds are added (at the D/E and the F/G threshold). This leaves additional room for misspecification and can, thus, lower significance of the estimated findings. That is, the presented findings are indicative of a relation between the degree of energy efficiency and credit risk. However, only an exact matching between the mortgage data and the building's energy rating will provide true insights into this issue. We leave this for future research.
Funding Open access funding provided by Università Ca' Foscari Venezia within the CRUI-CARE Agreement. EU Horizon 2020 project Energy efficient Mortgages Action Plan (EeMAP), grant agreement No 746205.
Data Availability Loan-level data are proprietary (European DataWarehouse). Energy Efficiency data are available by contacting Rijksdienst voor Ondernemend Nederland (Netherlands Enterprise Agency).

Code Availability
The logit and the Cox regressions have been performed using STATA.

Declarations
Conflict of Interests No competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.