Introduction

The subprime mortgage crisis serves as a powerful reminder of the seismic impact that the financial behavior of homeowners can exert on the U.S. financial system and economy. In the aftermath of the financial crisis, a voluminous literature has developed that aims to shed light on a key relationship in the run-up to the crisis: the interdependence between downward spiraling house prices and rising mortgage default rates. A better grasp of this issue was a matter of urgency during the housing market downturn as policymakers evaluated initiatives to curb the wave of foreclosures and help ‘underwater’ homeowners to stay in their homes (Calomiris et al. 2013; Foote et al. 2008). Yet, the topic remains high on the public policy agenda as it lays bare the tension between housing affordability and financial stability, and carries implications for mortgage market design and macro-prudential regulation.

In the post-crisis period, there has also been substantial interest in the development of mortgage default risk indicators which can serve as a “warning signal” for ensuing future turmoil in housing and mortgage markets. The construction of such forward-looking sentiment indices from household survey data (such as the consumer sentiment survey of the University of Michigan) however has proven elusive. Household surveys are constrained with respect to geographical coverage and number of participants. Their reliability is further complicated by the reluctance of respondents to truthfully answer sensitive questions particularly related to their financial affairs (Singer and Ye 2013), and hence they are of limited use as a predictive tool particularly in the context of housing and mortgage markets.

A viable alternative that has increasingly been pursued in recent research is the creation of sentiment indices from internet search queries. Da et al. (2011, 2015) develop an investor sentiment indicator for the stock market while Beracha and Wintoki (2013) and Van Dijk and Francke (2018) create a proxy for housing demand and show that online behavior has predictive power for house prices and liquidity in local residential markets. More recently, Chauvet et al. (2016) construct a mortgage default risk index (MDRI) based on the intensity of online searches for keywords such as “mortgage help” and “foreclosure assistance” captured by Google Trends. They show that this broad-based index predicts house price returns, returns on subprime credit default swaps and other relevant mortgage indicators, and conclude that MDRI “acts as a leading indicator of the most up-to-date, real-time measures of housing market performance.”

Despite the advantages of MDRI as a predictive tool relative to survey-based alternatives, little is known about the identity, reasons, or intentions of the households whose online searches are aggregated in the index. As Chauvet et al. (2016) point out, “searches are derived from all households, a universe that includes both owners and renters,” yet, it may be assumed that “the bulk of such searches likely emanate from property owners as they are most likely to be concerned with mortgage default.”

While this assumption seems plausible, it is unknown how households process the information they gather in their online searches. One possibility, suggested by Chauvet et al. (2016) is that MDRI captures “household concerns about mortgage failure or foreclosure.” Another plausible alternative is that households learn by searching for relevant terms online and condition their behavior on the information they gathered. That is, as a result of the information they collect online, households may adapt their behavior when dealing with financial distress, learning how to take advantage of government programs, or interacting with their mortgage lenders. Tetlock (2007) for example, hypothesizes a similar bi-directional relationship when studying the effect of negative media coverage on investor sentiment: While news printed in the Wall Street Journal might convey investor attitudes toward stocks not yet impounded in asset prices, they might also directly shape investors’ perception of stocks.Footnote 1 Similarly, online searches might divulge information and at the same time convey information to economic agents who then act on this information. Indeed, top results from online searches include information on government programs to avert foreclosure as well as legal information. The mechanism of information acquisition by online searchers, however, is likely different from the one discussed by Tetlock (2007). While investor reaction to media content described by Tetlock is consistent with noise trader theories implying irrational behavior, the information gathering by households via online searches could be rational. Online searches can help households chart an optimal plan of action given the legal and institutional framework available in the state in which they reside as well as provide guidance on how to take advantage of government assistance programs.

A third possible scenario is that some searches are originating from prospective home buyers or home sellers who are trying to time their transactions or from investors trying to form expectations about the future performance of mortgage-related assets. Online searches thus might reflect the expectations about future market trends of this group of agents.

From a theoretical perspective, these three hypotheses are consistent with different causal relationships. The first hypothesis would predict an increase in foreclosures while the second hypothesis would predict a decrease in foreclosures as a result of a surge in the MDRI. The third hypothesis would imply no relationship between MDRI and foreclosures but a negative relationship between MDRI and future house prices as agents reveal their negative expectations about future market trends when searching online. Currently little is known about which of these hypotheses applies to local housing markets as most of the analysis by Chauvet et al. (2016) is conducted at the national level (local level analysis is restricted in terms of geographical coverage and does not account for metropolitan-area specific demographic and economic conditions).

The main objective of this paper is to explore the relationship between MDRI and future house prices and foreclosure rates in local housing markets. We advance previous research by expanding the set of metropolitan areas and accounting for the differences in appreciation rates between house price segments within the same geographical area. Furthermore, we take into account local economic conditions as well as relevant aspects related to mortgage lending at the regional level. Using a large set of metropolitan-area specific fundamental factors, we estimate a long-run equilibrium model and disaggregate house prices into their fundamental (equilibrium) component and bubble (deviation from equilibrium) component. We then study the relationship between MDRI and future house prices as well as their fundamental and bubble components. Further, we use this house price disaggregation to provide a more detailed analysis of the impact of the fundamental and bubble components on foreclosures.

Related Literature

This paper contributes to two distinct strands of literature. The first strand examines the predictive power of online search intensity on real economic activity. Almost a decade ago, Hal Varian (Google’s Chief Economist) suggested that Google Trends data on the search volume for specific keywords helps predict information contained in future government data releases.Footnote 2 Since then academics have explored the predictive power of Google’s Search Volume Index (SVI) in other domains such as business activity and financial markets. Da et al. (2011, 2015) show that SVI captures investor attention and predicts stock prices at 2-week horizons. Beracha and Wintoki (2013) show that search intensity for terms such as “real estate” and “rent” help predict home prices. Chauvet et al. (2016) construct a mortgage default risk index from the search intensity of SVI for terms such as “mortgage assistance” and “foreclosure help,” and show that this index helps predict housing return, mortgage delinquency indicators, and subprime credit default swaps. In this paper, we examine the predictive power of this index for city-level housing appreciation rates in different market segments while taking into account local fundamental factors, mortgage market conditions, and mortgage market legislation in the states in which metropolitan areas are located (i.e. whether mortgage contracts are recourse or non-recourse). In recourse states, lenders can pursue borrowers for the mortgage balance remaining after foreclosed properties are sold while in non-recourse states they cannot.

The second strand of literature, which developed rapidly in the aftermath of the subprime mortgage crisis, explores the impact of house prices on foreclosure rates. Studies on the contributing role of price declines to mortgage defaults examine the extent to which household behavior conforms to the “option-theoretical” model of mortgage default. A key prediction of this theory is that households find it optimal to walk away from their investment as soon as their equity falls below a certain (negative) threshold (Foster and Van Order 1984; Kau et al. 1994). Closely related research on the ‘double trigger’ hypothesis has developed which aims to disentangle the contributing role of the strategic motive from that of affordability issues and cash flow problems of households (e.g. income shock related to job loss, divorce, or unforeseen healthcare expenses). Empirical studies conducted before the financial crisis find that negative equity is indeed a significant determinant of default (see e.g., Deng et al. 2000; Bajari et al. 2008; and Foote et al. 2008). Using the data from the financial crisis, Elul et al. (2010) present the estimates for the contributions of negative equity, illiquidity (measured by credit card utilization), unemployment shocks and the existence of a second mortgage to the probability of default. More recently, Kelly and McCann (2016) find that short-term arrears are primarily driven by unemployment, negative income shocks or divorce, while long-term arrears are much more likely to be due to negative equity. Using post-crisis data, Mocetti and Viviano (2017) identify job losses as a primary reason for mortgage delinquencies. Ghent and Kudlyak (2011) find that borrowers are 30% more likely to default in non-recourse states, whereby this effect is much stronger for homeowners of high-value homes. Moreover, Guiso et al. (2013) use survey data to demonstrate that the willingness to default increases in the home-equity shortfall. Further, the exposure to people who recently defaulted for strategic reasons increases default probabilities because it shows that lenders are unlikely to pursue a deficiency judgment against borrowers.

The interdependence between foreclosures and house prices has also received considerable attention in the literature. Lin et al. (2009) explore how the distance to a foreclosed property in space and time impacts home values by focusing on the role of comparables in the price formation process. LaCour-Little et al. (2020) argue that the value of comparables need to be adjusted for loan assumability, particularly in areas with a higher concentration of Federal Housing Administration insured loans. Calomiris et al. (2013) provide evidence that foreclosures dampen house prices, yet the negative impact of prices on foreclosures is much greater, in line with the theory of strategic borrower behavior. In contrast, Bhutta et al. (2017) find that emotional and behavioral factors are more important in the decision-making process of households than option-theoretic considerations. Gerardi et al. (2018) use data from the Panel Study of Income Dynamics (PSID) to assess the relative importance of negative equity versus ability to pay. While they find that strategic effects are important, changes in the ability to pay (e.g., job losses) have large estimated effects.

In this paper, we add to these studies by disaggregating house prices into their fundamental and bubble components and differentiating between recourse and non-recourse states. Consistent with strategic motives for default, we find that homes are foreclosed at higher rates in Metropolitan Statistical Areas (MSAs) located in non-recourse states. Furthermore, foreclosures increase when fundamental home values decline but are not sensitive to transitory deviations from equilibrium (bubble component of house prices).

The remainder of this paper is organized as follows. In “Methodology” section, we present the methodology and in “Data, Variable Construction, and Summary Statistics” section we describe the data. The empirical results are presented in “Results” section, and the concluding remarks in “Conclusion” section.

Methodology

We begin our analysis by estimating a fundamental house price model. We assume that house prices converge toward their equilibrium values in the long run, yet may exhibit deviations from equilibrium in the short run. Furthermore, as different segments of the housing market (i.e. starter homes and trade-up homes) might react differently to changes in fundamentals, we allow for different functional relationships between fundamentals and Top tier and Bottom tier house prices. That is, the relationships between Top and Bottom house price tiers and fundamentals are given by the functions

$$ {P}_{i,t}^{j\ast }={f}_j\left({X}_{i,t}\right) $$
(1)

where \( {P}_{i,t}^{j\ast } \) is the logarithm of the fundamental value of the house in tier j ∈ {T, B} (Top and Bottom) in MSA i, in month t and Xi, t is the vector of fundamental variables. Following Abraham and Hendershott (1996) and Capozza et al. (2004), we consider population, income, employment rate, user cost, and construction cost of housing in the MSA as fundamental factors. Further, because house prices are also affected by regional geographical and regulatory constraints, we add the land supply elasticity estimates derived by Saiz (2010) as a fundamental factor.Footnote 3 These supply elasticity indices vary across MSAs but not across time.

The objective of the fundamental model is to estimate the relationships fj(∙) yet a key concern with the estimation is that the levels of the house price indices and (some of) the fundamental factors might be non-stationary. A standard approach to address this issue is the estimation of an error correction framework, and the literature has proposed various specifications for the long-run relationship between house prices and fundamentals as well as the short-run dynamics of house prices (see, e.g. Drake 1993, Ashworth and Parker 1997, Kasparova and White 2001, and Stevenson 2008). In this paper, we estimate versions of the error correction mechanism proposed by Abraham and Hendershott (1996). This estimation method accounts for the serial correlation and the mean reversion in the time series of US housing returns that are widely documented in the literature (see, e.g. Case and Shiller 1989, 1990).

We denote the actual appreciation rates of house prices (i.e. continuously compounded returns) of the two house tiers by \( {\Delta P}_{i,t}^j={P}_{i,t}^j-{P}_{i,t-1}^j, \) and the appreciation rates of the fundamental house prices to be estimated by \( {\Delta P}_{i,t}^{j\ast } \). Further, we assume that the way prices respond to fundamental factors is given by a linear relationship

$$ {\Delta P}_{i,t}^j={\alpha}_0^j+{\alpha}_1^j{\Delta X}_{i,t}+{\theta}_{i,t}^j $$
(2)

Hereby \( {\alpha}_0^j+{\alpha}_1^j{\Delta X}_{i,t} \) is the change in the fundamental value, which we denote by \( {P}_{i,t}^{j\ast } \), and \( {\theta}_{i,t}^j \) denotes the “error term” which accounts both for momentum and mean reversion effects and is given by the equation

$$ {\theta}_{i,t}^j={\lambda}_0^j+{\lambda}_1^j{\Delta P}_{i,t-1}^j+{\lambda}_2^j\left({P}_{i,t-1}^{j\ast }-{P}_{i,t-1}^j\right)+{\varepsilon}_{i,t}^j $$
(3)

In this equation, the coefficient \( {\uplambda}_1^{\mathrm{j}} \) measures the momentum (serial correlation) while the coefficient \( {\uplambda}_2^{\mathrm{j}} \) measures the speed of adjustment to the long-run equilibrium. Combining eqs. (2) and (3) we obtain:

$$ {\Delta P}_{i,t}^j={\gamma}_0^j+{\alpha}_1^j{\Delta X}_{i,t}+{\lambda}_1^j{\Delta P}_{i,t-1}^j+{\lambda}_2^j\left({P}_{i,t-1}^{j\ast }-{P}_{i,t-1}^j\right)+{\varepsilon}_{i,t}^j $$
(4.OLS)

where \( {\gamma}_0^j={\alpha}_0^j+{\lambda}_0^j \). In addition to an OLS specification, we estimate fixed-effects models that allow for heterogeneity among MSAs and/or timeFootnote 4

$$ {\Delta P}_{i,t}^j={\gamma}_0^j+{\alpha}_1^j{\Delta X}_{i,t}+{\lambda}_1^j{\Delta P}_{i,t-1}^j+{\lambda}_2^j\left({P}_{i,t-1}^{j\ast }-{P}_{i,t-1}^j\right)+{\vartheta}_i+{\varepsilon}_{i,t}^j $$
(4.MSA-FE)
$$ {\Delta P}_{i,t}^j={\gamma}_0^j+{\alpha}_1^j{\Delta X}_{i,t}+{\lambda}_1^j{\Delta P}_{i,t-1}^j+{\lambda}_2^j\left({P}_{i,t-1}^{j\ast }-{P}_{i,t-1}^j\right)+{\mu}_t+{\varepsilon}_{i,t}^j $$
(4.Time-FE)

\( {\Delta P}_{i,t}^j={\gamma}_0^j+{\alpha}_1^j{\Delta X}_{i,t}+{\lambda}_1^j{\Delta P}_{i,t-1}^j+{\lambda}_2^j\left({P}_{i,t-1}^{j\ast }-{P}_{i,t-1}^j\right)+{\vartheta}_i+{\mu}_t+{\varepsilon}_{i,t}^j \)(4.MSA&Time-FE)

One difficulty with this estimation is that the fundamental values \( {P}_{i,t-1}^{j\ast } \) depend on the estimates of the different versions of eqs. (4) while at the same time they are part of the error correction term which is used as an explanatory variable in these equations. We resolve this issue using the iterative procedure proposed by Abraham and Hendershott (1996). We assume that the observed house price in December 1999 corresponds to its fundamental value (i.e. \( {P}_{i,t}^{j\ast }={P}_{i,t}^j \) for t=December 1999) and recover the fundamental value time series from the relationship \( {P}_{i,t}^{j\ast }={P}_{i,t-1}^{j\ast }+\Delta {P}_{i,t}^{j\ast } \). We then re-estimate eqs. (4) and re-calculate fundamental prices repeatedly until the estimates stabilize (typically we need to perform up to five iterations).Footnote 5

We then analyze how the current (and past) values of the mortgage default risk index, MDRIi,t impacts the future values of the fundamental component \( {P}_{i,t}^{j\ast } \) and the bubble component \( {B}_{i,t}^j={P}_{i,t}^j-{P}_{i,t}^{j\ast } \) of local house prices as well as the foreclosure rates HFi,t. Furthermore, we use the house price decomposition to explore how changes in the fundamental \( {P}_{i,t}^{j\ast } \) and the bubble \( {B}_{i,t}^j \) components of home values affect foreclosure rates HFi, t.

Data, Variable Construction, and Summary Statistics

The estimation of the fundamental house price model is based on a panel of 107 MSAs located in 29 U.S. states. A map with the location of these MSAs is presented in Fig. 1. For each MSA we observe the monthly growth rate of house prices and local fundamental factors. Further, in our analysis of the effect of the mortgage default risk index on house prices and foreclosures, we include additional variables that account for the mortgage market conditions in each MSA.

Fig. 1
figure 1

MSAs and states (recourse versus non-recourse). Notes: The red dots represent the locations of the MSAs. The dark blue areas represent recourse states, while the light blue areas represent non-recourse states.

All MSAs in the dataset are listed in Table 7 in the Appendix along with the state in which they are located. The table also contains the classification of states into the recourse and non-recourse categories depending on whether states allow lenders to pursue a deficiency judgment against foreclosed borrowers (we use the classification of Ghent and Kudlyak 2011). In this figure, the recourse states are depicted in dark blue and the non-recourse states are represented in light blue color.

Local House Prices and Fundamental Factors

In this study, we use the monthly Zillow home value indicesFootnote 6 for the period from April 1996 to December 2016. These indices are constructed from deed records using a hedonic methodology which accounts for individual attributes such as size and number of bedrooms and bathrooms. A major challenge in the construction of home value indices is the changing composition of the properties sold in different periods. Indices based on a repeat-sales methodology – such as the S&P Case-Shiller index or the index of the Federal Housing Finance Agency – account for this issue by using only properties that are sold more than once. This methodology has limitations for smaller regions or smaller market segments where the number of repeat sales is limited.Footnote 7 Zillow, on the other hand, aggregates all transactions to create valuations for all properties (Zestimates) based on their characteristicsFootnote 8 and uses the Zestimates to construct its regional price indices (see, e.g. Dorsey et al. 2010 for a discussion of this approach).

As we are interested in the dynamics of different market segments, we use the top and the bottom house price tiers in our analysis. The top tier index captures the median value of homes within the 65th to 95th percentile range while the bottom tier index captures the median value of homes within the 5th to 35th percentile range for each MSA. The dynamics of the top and the bottom price tiers for three of the MSAs in the dataset (San Diego, Minneapolis, and Phoenix) are presented in Fig. 2 (Panels A and B). These three MSAs represent a cyclical, a steady, and a bubble market, respectively (Mayer 2011).

Fig. 2
figure 2figure 2

Dynamics of house price indices for selected MSAs. a Bottom-tier house price index (solid lines) and fundamental price (dash lines). b Top-tier house price index (solid lines) and fundamental price (dash lines). c Bubble component of the bottom-tier house price index. d Bubble component of the top-tier house price index. Notes: San Diego, Minneapolis, and Phoenix represent examples of a cyclical, a steady, and a bubble market, respectively, according to the classification of Mayer (2011). The bubble component is calculated as the deviation from the fundamental house price, i.e. the difference between the logs of the house price index and its fundamental component: \( {B}_{i,t}^j={P}_{i,t}^j-{P}_{i,t}^{j\ast } \)

Although there is a variation across regional housing market segments, on average the indices peak in late 2006, and then decline and reach their lowest values between 2009 and 2012. They recover thereafter by almost reaching their pre-crisis period values around 2016. In our analysis, we use the log differences of the price indices (i.e., the continuously compounded returns) for the two market segments.

The fundamental variables used include the population, personal income per capita, total non-farm employment, construction cost, a derived user cost of homeownership, and the land supply elasticity index in the MSA. Descriptive statistics of these variables, except for land supply elasticity which is time-invariant, along with unit root tests are presented in Table 1.

Table 1 Descriptive statistics

The population and personal income data are collected from the Bureau of Economic Analysis. We use cubic spline interpolation (De Boor 1978) to derive monthly values from the original annual observations. The total non-farm employment, available at the state level, is collected from Datastream and used for all metropolitan areas located in the same state. The construction cost is measured by the price index of new single-family houses under construction, which is available from the U.S. Census Bureau. As only the national index is available in monthly frequency, the change in construction costs varies over time but not across MSAs. As a measure of land supply elasticity of MSAs, we use the land supply estimates derived by Saiz (2010).Footnote 9

To facilitate comparison to previous research, we construct the user cost by the method of Capozza et al. (2004) which accounts for mortgage rates, taxes, expected appreciation as well as annual maintenance and depreciation of properties. That is, the user cost is constructed by the formula

$$ \mathrm{User}\ \mathrm{cost}=\left(\mathrm{Mortgage}\ \mathrm{Rate}+\mathrm{Property}\ \mathrm{Tax}\ \mathrm{Rate}\right)\times \left(1-\mathrm{Income}\ \mathrm{Tax}\ \mathrm{Rate}\right)-\mathrm{Inflation}+0.03 $$
(5)

Here the “Mortgage Rate” is the 30-Year fixed-rate mortgage average in the United States, collected from the Federal Reserve Bank of St. Louis. The “Property Tax Rate,” collected from Wallethub,Footnote 10 is the effective real-estate state tax rate. The “Income tax rate” is the sum of the average federal income tax rate and average state income tax rate for the middle quintile of households. The federal income tax rate is collected from the Urban-Brookings Tax Policy Center,Footnote 11 while the state income tax rate is collected from the National Bureau of Economic Research.Footnote 12 For inflation, we use the CPI provided by the Federal Reserve Bank of St. Louis. The annual maintenance and obsolescence of properties are set at 3% as indicated in eq. (5).

Mortgage Lending

To account for local mortgage market conditions we construct two variables from the Home Mortgage Disclosure Act (HMDA) dataFootnote 13: the total amount of mortgage loans in a given year in each MSA (Loan supply) and the percentage of loans that are subprime, or higher-priced mortgage loans in each MSA (Subprime). Loans are categorized as subprime following the classification of Mayer and Pence (2008) according to which a mortgage is a subprime mortgage if its rate spread exceeds 3% for first-lien mortgages and 5% for junior lien mortgages.Footnote 14

Mortgage Default Risk

The Mortgage Default Risk Index (MDRI hereafter) of Chauvet et al. (2016) is constructed from the Search Volume Index (SVI) data for terms such as “foreclosure help” and “government mortgage help” in US states published by Google Trends.Footnote 15 The MDRI is obtained from the UCLA ZIMAN Center for Real Estate.Footnote 16 Zillow also publishes a Homes Foreclosed index (HF hereafter) which gives the number of homes foreclosed per 10,000 homes in metropolitan areas each month. As an illustration, in Fig. 3 (see Panels A and B) we present the dynamics of the MDRI and HF in three of the MSAs in our sample – San Diego, Minneapolis, and Phoenix. Both indicators start to increase in early 2007 and reach their peak around 2008 in San Diego, and around 2009 in Minneapolis and Phoenix. The descriptive statistics of these variables are presented in Table 1.

Fig. 3
figure 3

Default risk indices.

Results

We first explore how real house prices respond to changes in local fundamental factors by estimating the models given in eq. (4). In particular, we consider population, personal income, employment, as well as the variable we created for the user cost, construction cost, and the land supply elasticity of the MSA (cf. Capozza et al. 2004; Stevenson 2008). As a preliminary step, we perform unit root tests on the tiered house price indices as well as the fundamental variables (see the last two columns in Table 1). These variables are non-stationary in levels and stationary in first differences. This points to the inherent difficulties that would be present if we tried to directly use the levels of these variables in our statistical analysis. Furthermore, it justifies our focus on growth rates and the use of an error correction modeling approach.

Long-Run Equilibrium Relationship

The regression results of the error correction models specified in the four versions of eq. (4) are presented in Table 2. They include OLS estimates as well as estimates of fixed-effect models in which we control for MSA and time fixed effects.

Table 2 Estimates of the fundamental house price model

The coefficient estimates for all fundamental variables have the anticipated sign and are statistically significant at the 1% or 5% level. As expected, growth in population, personal income, and employment have a positive impact on house prices. An increase in user cost, a significant component of which constitutes the mortgage interest rate, is associated with lower house price growth. Similarly, an increase in construction cost leads to an increase in home values. Further, the relationship between the land supply index and house prices is negative, as had been found in previous literature. The error correction term is significant indicating that both the top and the bottom house price tiers adust to their long-run equilibrium values. Similarly, the autoregressive coefficient is positive and statistically significant, indicating the presence of momentum in housing returns for both house price tiers.

The magnitude of the coefficients suggests that bottom tier homes are more sensitive to changes in population, employment, user cost, and construction cost as well as exhibit a stronger momentum. We formally test whether the coefficients for the top tier and the bottom tier are significantly different from each other using the OLS model specification (Model 1 in Table 2). In particular, we construct a dummy variable “Toptier,” which takes on the value of one for the top tier and zero for the bottom tier index. We include it as a regressor along with the interactions of this variable with the fundamental variables. We estimate this regression using Abraham and Hendershott’s (1996) iterative method described in the “Methodology” section by pooling the top tier and bottom tier observations together. We find that only the coefficients for the interaction variables (Toptier ∗ House Pricet − 1) and (Toptier ∗ Construction cost) are significant. They have a negative sign indicating starter homes exhibit a stronger momentum effect and their response to construction cost is greater compared to trade-up homes.

The fundamental house price model allows us to disaggregate house price indices into their fundamental and bubble components. Using the estimates of Model 2 (Panel Fixed Effects) in Table 2, we calculate these two components of house price and represent their dynamics for three of the MSAs, (San Diego, Minneapolis, and Phoenix) in Fig. 2. In the following subsections, we analyze whether the MDRI helps predict these components of house prices and whether these components affect future foreclosures.

Effect of MDRI on House Prices

As a next step, we explore how household sentiment revealed by the mortgage default risk index (MDRI) impacts house prices. To account for mortgage market conditions, we add as regressors two variables that we constructed from HMDA data: the total amount of mortgage lending in the previous year (Total Loans), and the percentage of mortgage loans that are classified as subprime (Subprime). The results are reported in Table 3.

Table 3 Predictive power of MDRI for the house price appreciation rates (HP)

We find that an increase in the MDRI index lowers house price growth in the following three to 6 months. In the regression in which all lags are included (see model 8), the coefficients for the lags between 3 and 6 months are statistically significant and range between −0.00017 and − 0.0012. Further, as anticipated we find that the amount of mortgage credit that flows into the area serves to increase home values, while subprime lending in the previous year dampens home values in the current year.

In the Appendix, we present regression results for the 2007–2012 and the 2013–2016 subsamples (see Table 8, Panel A and Panel B, respectively), and we find that the predictive power of MDRI applies mostly for the first subsample that includes the subprime mortgage crisis. In addition to considering actual growth rates of the house price tiers, we also analyze the decomposition of house prices into their fundamental and bubble components. We find that the MDRI dampens both the fundamental component (see Table 9) and the bubble component (see Table 10) of house prices.

The regression results also indicate that foreclosure legislation has a significant effect on home value appreciation rates. In particular, house price growth is on average lower in the metropolitan areas located in recourse states where lenders can pursue a deficiency judgment against borrowers. The coefficient for the recourse dummy variable in Table 3 is statistically significant at conventional levels and equals −0.0002 across all specifications. One possible explanation is that buying a home with a mortgage is less attractive to borrowers in a recourse state where contracts are lacking the put option associated with mortgage default.

Effect of MDRI on Foreclosure Rates

Ghent and Kudlyak (2011), Table 1 provide an overview of foreclosure legislation across US states and present statistics of the timeline of different stages in the foreclosure process in each state. If there are no delays, a non-contested non-judicial foreclosure can take as little as 60 days, yet often the process takes longer. Furthermore, foreclosures are followed by a redemption period with a duration of another 6 months. It could only be speculated when delinquent borrowers start searching online for help and how the intensity of their searches varies over time. To allow for different timing of online search we explore alternative specifications. In Table 4 we present results for search behavior with lags between 1 and 6 months.

Table 4 Predictive power of MDRI for the Homes Foreclosed (monthly lags)

We find that an increase in the MDRI lowers foreclosures for horizons between 2 and 6 months.Footnote 17 These coefficients are statistically significant and range between −0.0425 and − 0.1645 (see model 8). In Table 5 we aggregate the Google searches for periods of 3 months and considers regressions with lags of up to a year.

Table 5 Predictive power of MDRI for the Homes Foreclosed (three-month lags)

We find also for this setting that Google searches reduce foreclosures (the coefficients for the lags between 1 and 3 months and lags between 9 and 12 months are statistically significant). As a robustness check, in Table 11 presented in the Appendix we consider the 2007–2012 and the 2013–2016 subsamples and largely find an inverse relationship between MDRI index and future foreclosures.

These findings are consistent with the hypothesis that the MDRI index captures learning effects. That is, by searching online some households may access information that helps them avert foreclosure. Further, consistent with the theory of strategic default, we find that foreclosure rates are lower in recourse states. Similar findings are reported in the recent empirical literature on the effect of recourse on default. For example, Ghent and Kudlyak (2011), Table 3 report that the probability of default of loans made in recourse states is on average 6.2% smaller (although their coefficient estimate is not significant).

Effect of House Prices on Foreclosure Rates

In Table 6 we report results for several alternative specifications. We disaggregate house prices to their fundamental and bubble components and study the contributing effect of these two components on foreclosures.

Table 6 Predictive power of house price declines for the Homes Foreclosed

The finding that is robust across all specifications is that the foreclosures respond to changes in fundamental values. A 1 % drop in fundamental home values increases the log of the homes foreclosed by 1.2, i.e. leads to about 3.3 extra foreclosures in the following month for every 10,000 homes. The bubble component, on the other hand, does not appear to have an impact on the proportion of defaulting homeowners. We interpret this as further evidence for strategic sophistication by homeowners. A shock to the fundamental component of home values would have a long-term effect on future house prices while a shock to the bubble component would disappear over time as home values revert to their long-run equilibrium. Indeed, note that the speed of adjustment coefficient in the fundamental equation reported in Table 2 is significant and has the anticipated sign. We further examine strategic default behavior by constructing a dummy variable for house price declines of more than 5% in the past twelve months (PriceDecline>5%). We do not find evidence that foreclosures are driven primarily by option-theoretic defaults: the coefficients for both the Recourse dummy and the interaction term of the Recourse dummy with the (PriceDecline>5%) dummy are not significant. These findings correspond to results reported in the recent empirical literature documenting the relative rarity of defaults due solely to strategic motives, and the relative importance of affordability constraints. Bhutta et al. (2017) find that homeowners do not walk away from their investments unless they are substantially more ‘underwater’ than option-theoretical models would predict, while Gerardi et al. (2018) find that default is primarily driven by income shocks rather than strategic motives. We further explore how foreclosures respond to house price declines of more than 5% in the previous year and find that the foreclosures increase more in the MSAs sustaining such declines. There is no evidence of a differential impact of such declines in recourse and non-recourse states. Furthermore, the effect on foreclosures is about the same regardless of whether the decline has originated in the top tier or the bottom tier of the local housing market.

Conclusion

As of 2020, Google commands more than 92% of the search engine market share worldwide with an estimated number of approximately 2 trillion global searches per year (www.hubspot.com). Extant research has established that Google searches provide timely indicators for social and economic activity in a variety of domains ranging from automotive sales to the spread of infections, to asset returns in financial and housing markets. Da et al. (2011) find that search volume data predict stock returns and conclude that “search data has the potential to objectively and directly reveal to empiricists the underlying belief of an entire population of households.” Chauvet et al. (2016) constructed a mortgage default risk index from data on Google search volumes for keywords such as “mortgage help” and “foreclosures assistance” and demonstrated that this index has predictive power for the returns on housing and mortgage-related assets.

In this paper, we analyze how this mortgage default risk index is related to house prices and foreclosure rates in local housing markets. Using a long-run equilibrium model, we disaggregate local house prices into their fundamental and bubble components. We then explore how the mortgage default risk index relates to future housing market outcomes such as house prices and foreclosures. In line with previous literature, we find that an increase in the mortgage default risk index leads to lower house price appreciation rates. Perhaps somewhat surprisingly, we also find that an increase in the mortgage default risk index reduces the percentage of foreclosures for various time horizons. One interpretation of these findings is that economic agents not only reveal their sentiments through their search behavior but also collect and process the information they access and as a consequence adapt their behavior. That is, through online searches for “foreclosure help” and “mortgage assistance” households can access relevant information that helps them avert foreclosures.

We also report new results on the interaction between housing and mortgage markets which suggest some degree of household strategic behavior. In particular, we find that declines in the fundamental component of house prices lead to an increase in foreclosure rates while declines in the transitory component of house prices have no statistically significant effect. Furthermore, foreclosures tend to be higher in metropolitan areas situated in non-recourse states where lenders cannot pursue a deficiency judgment against borrowers.

In addition to exploring its predictive properties, one can also use online search data to empirically test economic models that incorporate the learning of economic agents and view equilibria as the outcome of adaptive behavior. The empirical assessment of such models is left for future research.