Skip to main content
Log in

Should we Fear the Shadow? House Prices, Shadow Inventory, and the Nascent Housing Recovery

  • Published:
The Journal of Real Estate Finance and Economics Aims and scope Submit manuscript


Although a broad-based increase in house prices has been observed over the past year, not everyone is convinced the rise of house prices will persist and lead to a steady recovery of the economy. The main reason for this skepticism is uncertainty about the “shadow inventory”: foreclosed homes held by investors or as REOs, which have not yet hit the market but likely will as market prices rise. The volume of shadow inventory itself in local markets is largely unknown, as is its impact on the housing market. This study quantifies the size of the shadow inventory and investigates the spatial impact of the out-flow of shadow inventory. The scope of our study is a set of housing markets (AZ, CA, and FL) that vary in both their historic housing price volatility as well as institutional factors - such as foreclosure law statutes - that may influence the relationship between the shadow inventory and house price dynamics. To address the endogeneity that characterizes the spatial interaction of house prices and the out-flow of the shadow inventory, we utilize a simultaneous equation system of spatial autoregressions (SESSAR). The model is estimated using measures of the shadow inventory derived from DataQuick’s national transaction history database and county-level house price indices provided by Black Knight. Lastly, because our estimate - as well as all other existing estimates - of the shadow inventory relies upon string matching algorithms to identify entry into and exit out of REO status, we validate the accuracy of our measures of REOs using loss mitigation data from the OCC Mortgage Metrics database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others


  1. This appreciation rate is based on the Federal Housing Finance Agency non-seasonally-adjusted purchase-only price index.

  2. For example, in a 2010 article in Business Insider Keith Jurow stated that “this massive number of homes [in the shadow inventory] will put enormous downward pressure on sales prices.”

  3. See Can (1992), Can and Megbolugbe (1997), Pace and Gilley (1997) and Pace et al. (1998) as examples.

  4. Although, we could recover the estimates of the fixed effects using the results of the GS3SLS estimates, since the fixed effects are nuisance parameters, we do not report their estimated values in the paper.

  5. A small number of papers have employed spatial econometric techniques in a multi-equation framework. Steinnes and Fisher (1974) first tried to incorporate spatial interactions of employment and population in their intra-urban population and employment model. Boarnet (1994) extended Carlino and Mills (1987) model by introducing spatial cross-correlation into the employment and population equation system. Henry et al. (1997) further extended Boarnet’s (1994) model by adding interaction terms between urban growth rates and the spatial cross-regressive lags as additional explanatory variables. Henry et al. (2001) augmented Carlino and Mills (1987) and Boarnet’s models by incorporating spatial auto-regressive lags, and compared these augmented models with Henry et al.’s (1997) model. Gebremariam et al. (2011) extended the conventional two equations system to a five-equation system, allowing the interdependences (spatial autocorrelation, spatial cross-correlation and cross-equational correlation) among employment growth, migration behavior, household income, and local public services.

  6. The literature on foreclosure contagion effects has been reviewed extensively in Frame (2010) and Ihlanfeldt and Mayock (2013) and will not be summarized here.

  7. If a homeowner defaults on a note in a jurisdiction that uses deeds of trust instead of mortgages, a trustee’s deed upon sale is filed as a part of the foreclosure proceedings.

  8. Ideally, Class 1 and Class 2 should capture all the REO entries, and Class 3 and Class 4 would be associated with transactions following entry. Because of errors in public records or DataQuick’s distress identification algorithm, however, it is possible for the true entry into the REO stock to go unrecorded. In such cases, a property would be associated with a Class 3 or Class 4 transaction with no subsequent Class 1 or Class 2 transaction. Because Class 3 and Class 4 transactions cannot generally occur without the property first being transferred to a financial institution, we flag such transactions as entries into the REO stock.

  9. For instance, for a sale to be characterized as a Class 5 transaction, the algorithm must recognize the seller as a financial institution. If the formatting of the seller name field was such that DataQuick’s algorithm did not identify the seller as a financial institution, the true exit from the stock will not be captured by the distressed transaction field.

  10. If no Class 5 distressed sale or non-distressed sale is found within 3 years after a property enters the REO stock, we code the REO exit date for this property as 3 years since the date of entry. We limit a property’s stay in REO status to three years as such stays represent the extreme right tail of the REO liquidation time distribution. According to workout data provided as a part of the HAMP program, in Arizona, California, and Florida, after properties reached REO status they were sold, on average, within 148 days, 182 days, and 146 days, respectively. Our three–year limit thus represents an REO spell that is six to seven times the average REO holding period in the states in our sample.

  11. This report is currently titled “Market Pulse” and was previously called “US Housing and Mortgage Trend.”

  12. CoreLogic produced this report on an irregular basis, and the values contained in the report pertain to sales activity only in the month of the report. Although we utilize quarterly data to estimate our models, for this comparison we report the sales measures derived from the DataQuick data measured at a monthly frequency.

  13. Pre-foreclosures include any mortgage that has been referred to an attorney to initiate legal foreclosure proceedings but has not yet gone to foreclosure sale. A loan is in post-sale foreclosure status if the bank has obtained title but the property is not yet being actively marketed. In the Mortgage Metrics data, a loan is classified as being in REO status only if the property is being actively marketed for sale.

  14. The date on which servicers report entry into the REO stock is expected to differ from the date the transfer from the borrower to the lender is recorded in the public record. This difference in timing necessitates using a window around the REO entry dates in the Mortgage Metrics and DataQuick files. We have chosen a 12-month window for our match definition as an initial analysis of the matches suggested that when a valid match occurs, the dates from the two sources are within 12 months 85 percent of the time.

  15. Although the Mortgage Metrics data begins in 2008, we identified a number of limitations in the foreclosure variable prior to 2010; for that reason, we have restricted our analysis to loans entering REO status starting in January of 2010.

  16. BK’s white paper on the construction of their HPI indicates that their index is constructed in a manner that eliminates the impact of distressed sales on the index dynamics. The price appreciation measures described in the rest of the text thus reflect appreciation rates of non-distressed properties.

  17. A cash sale is defined as an arm’s length transaction for which no mortgage was recorded in the public record.

  18. Specifically, loans securitized by Fannie Mae, Freddie Mac, or through the Ginnie Mae programs are not included in DataQuick’s loan performance data.

  19. The DataQuick MBS data contains loan performance through 2013. Because of the collapse of the private-label market following the financial crisis, there are virtually no post-2008 loan originations in our data.

  20. Specifically, we remove loans that: are in the processed of being foreclosed; are being held as REO; or have borrowers that have declared bankruptcy.

  21. The primary difference between judicial and non-judicial foreclosure processes is the manner in which the lender takes possession of the home collateralizing the debt following default. In a judicial foreclosure, the lender must first file a claim against the borrower petitioning the court to allow for the seizure of the collateral, and the borrower is permitted to contest the lender’s claim. If a lender is successful in its petition, the court issues a judgment for the amount owed (generally inclusive of collection expenses) against the borrower, and a foreclosure auction is scheduled. The home is then deeded to the highest bidder at the auction, which is oftentimes the lender.

    In a non-judicial foreclosure, the lender is not required to seek a judgment against the borrower before foreclosing on the property, speeding the foreclosure process significantly. In such states, following default, the lender must follow a state-specific rules - such as filing a Notice of Default - before liquidating the property at a public auction.

    It should be noted that many states allow lenders to choose between pursuing judicial and non-judicial foreclosures; because of their expediency, however, non-judicial foreclosures are generally pursued in such states. Ghent and Kudlyak (2011) provide a detailed description of the foreclosure process for all U.S. states.

  22. Some of the variables that enter the model in natural logarithms (e.g., short sales) have zero values in some quarters. Because the natural logarithm of zero is undefined, we replace the zero values with 1 in such cases.

  23. We used an F-test to test for the presence of locational and temporal fixed effects. The results of these tests indicated that location and temporal fixed effects are statistically significant in all three equations.

  24. The standard errors based on the OLS estimation of Eq. 2 must be corrected for the loss of degrees of freedom associated with the inclusion of county-level fixed effects. Specifically, the variance of our estimator needs to be adjusted based on the ratio of the degrees of freedom from Eqs. 1 and 2. For example, suppose γ 1 is of dimension K in Eqs. 1 and 2. Then the adjustment factor for the variance of our estimator is (N T−2−KT)/(N T−2−KTN+1).

  25. Note that this specification does not included temporal fixed effects such as those that were included in Eq. 2 (i.e., α m (t)). We have excluded these controls to make the estimation results of the non-spatial model simultaneous equations model comparable to those of the spatial model simultaneous equations model that follows.

  26. In the case of the first dependent variable in the system, m=1 and \( y_{1,NT}^{e}=[\begin {array}{cc}y_{2,NT} & y_{3,NT} \end {array}]\). For \(m=2, y_{2,NT}^{e}=[\begin {array}{cc}y_{1,NT} & y_{3,NT} \end {array}]\). For \(m=3, y_{3,NT}^{e}=[\begin {array}{cc}y_{1,NT} & y_{2,NT} \end {array}]\).

  27. c N is equivalent to a combination of the constant intercept and the locational fixed effects in Eq. 1.

  28. In addition to the results reported here, we also estimated spatial models that included both location and time period fixed effects. The results of these estimations suggested that the inclusion of both types of fixed effects simultaneous removes too much of the variation to successfully identify the parameters of the model.

  29. The time-mean deviation operator is defined as \(J_{T}=I_{T}-\frac {1}{T} l_{T}l_{T}^{\prime },\) where I T is a T×T identity matrix, and l T is a T×1 vector of ones. Lee and Yu (2010) suggest to use an orthogonal transformation of J T (i.e., time-mean deviation operator) and J N (i.e., space-mean deviation operator) to maitain the linear independence over time dimension of the original disturbances. We didn’t start with a structural approach by assuming \(\varepsilon _{N}^{\ast }(t)\) in Eq. 7 has an iid distribution; instead, we can always restrict the transformed disturbances to be iid to avoid complications. Therefore, we stick with the simple format of J T to eliminate the locational fixed effects.

  30. Although the SAR structure is indeed the same (e.g., the parameters of λ, ρ, β, and γ are not changed), the weight matrix is no longer the same. Because the weight matrix is row normalized, the time-transformed weight matrix, \((J_{T}\otimes I_{N})(I_{T}\otimes W_{N}^{\ast }),\) is equivalent to \((J_{T}\otimes I_{N})(I_{T}\otimes W_{N}^{\ast })(J_{T}\otimes I_{N}),\) with \((I_{T}\otimes W_{N}^{\ast })\) being the weight matrix of the stacked observations across T time periods. Thus, the time-deviation operator of (J T I N ) is passed to \(y_{N}^{\ast }(t)\) and \(u_{N}^{\ast }(t)\). Meanwhile, the transformed weight matrix \((J_{T}\otimes I_{N})(I_{T}\otimes W_{N}^{\ast })\) is no longer the same as \((I_{T}\otimes W_{N}^{\ast })\). We denote the time-transformed weight matrix as W N to differentiate it from the original weight matrix \(W_{N}^{\ast }\).

  31. We assume that the system only involves one weight matrix. This assumption simplifies the estimation and is commonly made in applied work. Our results can be generalized in a straightforward way to the case in which each spatially lagged variable depends upon a weight matrix which is unique to that variable.

  32. For instance Z 1,N T =(W N T y 1,N T ,y 2,N T ,y 3,N T ,X 1,N T ).

  33. (A 1A 2)(A 3A 4)=(A 1 A 3A 2 A 4) when A 1 and A 3, and A 2 and A 4 are conformable.

  34. The existence of the spatial lag, W N T Y N T ,distinguishes the SESSAR model from the traditional linear structural equation models; an implication of this difference is that conventional rank conditions can no longer by used to verify identification of the system. The interested reader is referred to Brown (1983), Roehrig (1988), and Benkard and Berry (2006) who discuss the identification of simultaneous equation systems, of which the SESSAR model is a special case with additive errors.

  35. Examples of such metrics include the inverse of the distance between units and the binary spatial contiguity measure.

  36. The migration pattern from county to county is reasonably stable between 2005 and 2010. For example, the number of migrants varies between 801 and 1,049 from Maricopa to Gila, and between 1,255 and 1,425 from Maricopa to Navajo during this time span.

  37. Migration patterns change slowly over time, and mobility flows from 2005 are very similar to those in 2006 (the first year of our data). That said, there is a concern that using a one-year lag in the IRS data may not be sufficient to render the weight matrix exogenous. As noted by an anonymous referee, an informal way to investigate the exogeneity of the weight matrix is to compare the estimated coefficients from our initial model with those of an identical model constructed using a truly exogenous weight matrix weight matrix, such as a distance-based weight matrix; a similarity between the estimated coefficients suggests that any endogeneity issues regarding the weight matrix are minor. To that end, we re-estimated our system of equations using a distance-based weight matrix. We found that the coefficients on the exogenous variables were stable across the different choices of the weight matrix, but the spatial dependence parameters and the parameters on the endogenous variables differed across specifications. The latter result is unsurprising as altering the weight matrix changes the spatial cross-equation interactions. We thus interpret the stability of the coefficients on the exogenous variables as evidence that any endogeneity issues associated with our choice of the migration-based weight matrix are limited.

  38. It should be emphasized that our definition of “direct impact” differs from that of LeSage and Pace (2009), who define the direct impact as an average partial derivative of the response variable with respect to the covariate.

  39. Such comparisons could include, for instance alternative specifications of the regressor matrix, alternative specifications of weight matrix, and alternative specifications of the disturbance term.

  40. We have only one alternative specification to test against. According to Kelejian and Piras (2009), when the number of alternative specifications, G, is equal to one, the J test has the same power under H 1 with either of the predictors.

  41. Although we do not have any data on maintenance expenditures or the quality of bank-owned homes, workout data from the HAMP program suggests that the level of deferred maintenance on homes entering the REO stock may differ significantly across states. The average time between last payment and foreclosure in Florida (503 days) is significantly longer than the pre-foreclosure timelines in Arizona (317 days) and California (368 days). If the time between the last payment and the transition into the REO stock is when most deferred maintenance occurs, then the gains from basic maintenance from the servicer handling the foreclosed home are likely to be highest in Florida.

  42. We thank an anonymous referee for pointing out this interpretation.

  43. To conserve space, we do not report the full results for the spatial models estimated using the distance-based weight matrix. These results are available upon request.

  44. The direct effect associated with an increase in the error term for a given endogenous variable is an increase in that particular variable. The total effect, however, may be negative if the shock also increases endogenous variables that are inversely related with the variable of interest.

  45. The average shocks reported in Table 11 are defined as \(\frac {1}{N}\sum \limits _{i=1}^{N}Shock_{ij}^{HPA}\), \(\frac {1}{N} \sum \limits _{i=1}^{N}Shock_{ij}^{Outflow}\), and \(\frac {1}{N} \sum \limits _{i=1}^{N}Shock_{ij}^{Inflow}\). The price changes associated with these shocks are defined as \(\frac {1}{N}\sum \limits _{i=1}^{N}P_{ij}^{HPA}\), \(\frac {1}{N}\sum \limits _{i=1}^{N}P_{ij}^{Outflow}\), and \(\frac {1}{N}\sum \limits _{i=1}^{N}P_{ij}^{Inflow}\).

  46. A discussion of immigration flows to Miami-Dade County can be accessed at

  47. The GS3SLS estimator has been proven to have the same asymptotic distribution as the true estimator, assuming some regularity conditions hold. See Kelejian and Prucha (2004) for details.

  48. See Baltagi (2001) for details. The entries of ys, and Xs take the values of the deviation from the average over time.

  49. In our specification of X 1,N T ,X 2,N T , and X 3,N T , there exist duplicate exogenous variables. To avoid multicollinearity in the estimation, in the specification of .X N T , we allow each distinct variable to appear only once.

  50. Lee’s paper (2003) was written before Kelejian and Prucha (2004). His proposed “optimal IVs” are developed in the context of the single SAR equation in Kelejian and Prucha (2004). Kelejian and Prucha used the same instrument, H, after they expanded the single equation setting to a simultaneous equations system in 2004.


  • Baltagi, B.H. (2001). Econometric analysis of panel data, 2nd ed. New York: John Wiley & Sons.

    Google Scholar 

  • Benkard, C.L., & Berry, S. (2006). On the nonparametric identification of nonlinear simultaneous equations models: Comment on B. Brown (1983) and Roehrig (1988). Econometrica, 74, 1429–2440.

    Article  Google Scholar 

  • Boarnet, M. (1994). An empirical model of intra-metropolitan population and employment growth. Papers in Regional Science, 73(2), 135–153.

    Article  Google Scholar 

  • Brown, B.W. (1983). The identification problem in systems nonlinear in the variables. Econometrica, 51, 175–196.

    Article  Google Scholar 

  • Can, A. (1992). “Specification and Estimation of Hedonic Housing Price Models,”. Regional Science and Urban Economics, 22(3), 453–477.

    Article  Google Scholar 

  • Can, A., & Megbolugbe, I. (1997). Spatial dependence and house price index construction. The Journal of Real Estate Finance and Economics, 14(1), 203–222.

    Article  Google Scholar 

  • Carlino, G., & Mills, E. (1987). The Determinants of County Growth. Journal of Regional Science, 27(1), 39–54.

    Article  Google Scholar 

  • Cliff, A.D., & Ord, J.K. (1973). Spatial Autocorrelation. London: Pion Ltd.

    Google Scholar 

  • Coulson, E., & Li, H. (2013). Measuring the External Benefits of Homeownership. Journal of Urban Economics. forthcoming.

  • Ellen, E., Madar, J., & Weselcouch, M. (2012). What’s really happening to the REO stock? An analysis of three cities: New York, Atlanta, and Miami. Furman Center for Real Estate and Urban Policy. Working Paper.

  • Ellen, I., Lacoe, J., & Sharygin, C. (2013). Do foreclosures cause crime? Journal of Urban Economics, 74, 59–70.

    Article  Google Scholar 

  • Frame, W. (2010). Estimating the effect of mortgage foreclosures on nearby property values: A critical review of the literature. Federal Reserve Bank of Atlanta Economic Review, 95(3), 1–9.

    Google Scholar 

  • Gebremariam, G., Gebremedhin, T., Schaeffer, P., & Jackson, R. (2011). Modeling regional growth spillovers: An analysis of employment growth, migration behavior, local public services and household income in Appalachia. Working Paper.

  • Ghent, A., & Kudlyak, M. (2011). Recourse and residential mortgage default: Evidence from US States. Review of Financial Studies, 24(9), 3139–3186.

    Article  Google Scholar 

  • Henry, M., Schmitt, B., & Piguet, V. (2001). Spatial econometric models for simultaneous systems: Application to rural community growth in France. International Regional Science Review, 24(2), 171–193.

    Article  Google Scholar 

  • Henry, M., Barkley, D., & Bao, S. (1997). The Hinterland’s stake in Metropolitan growth: Evidence from selected southern regions. Journal of Regional Science, 37(3), 479–501.

    Article  Google Scholar 

  • Kiefer, H., & Kiefer, L. (2011). The co-movement of mortgage foreclosure rate and house price depreciation: A spatial simultaneous equation system. Working Paper.

  • Harding, J., Rosenblatt, E., & Yao, V. (2009). The contagion effect of foreclosed properties. Journal of Urban Economics, 66(3), 164–178.

    Article  Google Scholar 

  • Ihlanfeldt, K., & Mayock, T (2013). The impact of REO sales on neighborhoods and their residents. Journal of Real Estate Finance and Economics. forthcoming.

  • Immergluck, D. (2010). The accumulation of lender-owned homes during the U.S. mortgage crisis: Examining metropolitan REO inventories. Housing Policy Debate, 20 (4), 619–645.

    Article  Google Scholar 

  • Kelejian, H., & Piras, G. (2009). An extension of Kelejian’s spatial J-Test and an application for a test of the error structure. Working Paper.

  • Kelejian, H. (2008). A spatial J-test for model specification against a single or a set of nonnested alternatives. Working Paper.

  • Kelejian, H., & Prucha, I. (2004). Estimation of simultaneous systems of spatially interrelated cross sectional equations. Journal of Econometrics, 118, 27–50.

    Article  Google Scholar 

  • Lee, L.F., & Yu, J. (2010). “Estimation of spatial autoregressive panel data models with fixed effects. Journal of Econometrics, 154, 165–185.

    Article  Google Scholar 

  • Lee, L.F. (2003). Best spatial two-stage least squares estimators for a spatial autoregressive model with autoregressive disturbances. Econometric Reviews, 22(4), 307–335.

    Article  Google Scholar 

  • LeSage, J., & Pace, K. (2009). Introduction to Spatial Econometrics. Boca Raton: CRC Press.

    Book  Google Scholar 

  • Pace, K., & Gilley, O. (1997). Using the spatial configuration of the data to improve estimation. The Journal of Real Estate Finance and Economics, 14(3), 333–340.

  • Pace, K., Barry, R., Clapp, J., & Rodriguez, R. (1998). Spatio-temporal estimation of neighborhood effects. The Journal of Real Estate Finance and Economics, 17(1), 15–34.

    Article  Google Scholar 

  • Roehrig, C. S. (1988). Conditions for identification in nonparametric and parametric models. Econometrica, 56, 433–447.

    Article  Google Scholar 

  • Smith, G., & Duda, S. (2009). Roadblock to Recovery: Examining the disparate impact of vacant lender-owned properties in Chicago. Woodstock Institute.

  • Steinnes, D., & Fisher, W. (1974). An econometric model of intra-urban location. Journal of Regional Science, 14(1), 65–80.

    Article  Google Scholar 

  • Wheaton, W. (1990). Vacancy, search, and prices in a housing market matching model. Journal of Political Economy, 98(6), 1270–1292.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Tom Mayock.

Additional information

The views in the paper are those of the authors alone and do not reflect those of the Office of the Comptroller of the Currency, the U.S. Department of the Treasury, or Freddie Mac. The authors thank Fredrik Andersson, Ben Keys, Xiaodong Liu, participants at the 2013 meetings of the Urban Economics Association, participants at the 2013 AREUEA National Conference, the VII World Conference of the Spatial Econometrics Association, and the OCC seminar series for helpful comments.


Appendix A: Estimation Procedures

In this section, we first illustrate Kelejian and Prucha (2004)’s full information generalized spatial three stage least squares (GS3SLS) procedure, and then explore a series of ideal instrumental variables along the lines of Lee (2003)’s “optimal instrument.”Footnote 47 Because our models are estimated using panel data with location specific fixed effects, we first demean all the dependent and explanatory variables.Footnote 48 Before the application of GS3SLS, we also stack the observations across spatial units and over time periods as specified in Eqs. 13, 14, and 15.

A.1 Kelejian and Prucha’s GS3SLS

The first step in the GS3SLS procedure is to obtain an initial 2SLS estimate of δ m for each m of Eq. 16 using an instrumental variable matrix of \(H=[\begin {array}{ccc}X_{NT} & W_{NT}X_{NT} & W_{NT}^{2}X_{NT} \end {array}]\), where \(X_{NT}=[\begin {array}{ccc}X_{1,NT} & X_{2,NT} & X_{3,NT} \end {array}]\) is a matrix including all the exogenous variables in the system.Footnote 49 This initial 2SLS estimate is often called generalized spatial 2SLS (GS2SLS) in the spatial econometrics literature. In the second step, the residuals from previous step are used to form an estimate of ρ m supported by several moment conditions. The last step of GS3SLS uses the estimate of ρ m from step 2 to update the initial 2SLS estimate of δ m by reestimating a Cochrane-Orcutt-type transformed equation of \( y_{m,NT}^{\ast }=Z_{m,NT}^{\ast }\delta _{m}+\varepsilon _{m,NT},\) with y m,N T∗=y m,N T ρ̂ m W N T y m,N T and Z m,N T∗=Z m,N T ρ̂ m W N T Z m,N T ; all three equations are then stacked together to form the final GLS estimate of δ m s accounting for the structure of Σ. Altogether, the GS3SLS method updates δ m s 3 times to efficiently utilize all the information available; that is, this procedure uses within-equation and cross-equation information to estimate the parameters. The three steps of this process are spelled out in detail below.

A.1.1 Step 1 - Initial 2SLS (GS2SLS)

We first obtain an initial estimate of δ m (m=1,2,3 ) using 2SLS for each equation. Define P H as

$$P_{H}=H(H^{\prime }H)^{-1}H^{\prime }, $$

where \(H=(X_{NT},W_{NT}X_{NT},W_{NT}^{2}X_{NT})\) with X N T =(X 1,N T ,X 2,N T ,X 3,N T ). Now define

$$\widetilde{Z}_{m,NT}=\left[ \begin{array}{ccc} \widetilde{y}_{m,NT} & X_{m,NT} & \widetilde{\overline{\overline{y}}}_{m,NT} \end{array} \right], $$

where \(\widetilde {y}_{m,NT}=P_{H}y_{m,NT}, X_{m,NT}=P_{H}X_{m,NT}\), and \(\widetilde {\overline {\overline {y}}}_{m,NT}=P_{H}\overline {y}_{m,NT}\). The 2SLS estimator of δ m is then

$$\widetilde{\delta }_{m,2sl}=(\widetilde{Z}_{m,NT}^{\prime }Z_{m,NT})^{-1} \widetilde{Z}_{m,NT}^{\prime }y_{m,NT}, $$

and the 2SLS residuals are

$$\widetilde{u}_{m,2sl}=y_{m,NT}-Z_{m,NT}\widetilde{\delta }_{m,2sl}. $$

A.1.2 Step 2 - Spatial Autoregressive Parameter ρ m

We then use \(\widetilde {u}_{m,2sl}\) from step 1 and some moment conditions to estimate the spatial autoregressive parameter ρ m specified in Eq. 16. Based on the moment functions of ε m,N T (m=1,2,3 ) in Eq. 12 and the relationship between u m,N T and ε m,N T summarized in Eq. 16, we have

$$\begin{array}{@{}rcl@{}} E\left(\frac{\varepsilon_{m,NT}^{\prime }\varepsilon_{m,NT}}{NT}\right) &=&E\left(\frac{ u_{m,NT}^{\prime }u_{m,NT}}{NT}+{\rho_{m}^{2}}\frac{\overline{u}_{m,NT}^{\prime }\overline{u}_{m,NT}}{NT}-2\rho_{m}\frac{u_{m,NT}^{\prime } \overline{u}_{m,NT}}{NT}\right)=\sigma_{mm}, \\ E\left(\frac{\overline{\varepsilon }_{m,NT}^{\prime }\overline{\varepsilon}_{m,NT}}{NT}\right)&=&E\left(\frac{\overline{u}_{m,NT}^{\prime }\overline{u}_{m,NT}}{NT}+{\rho_{m}^{2}}\frac{\overline{\overline{u}}_{m,NT}^{\prime }\overline{\overline{u}}_{m,NT}}{NT}-2\rho_{m}\frac{\overline{\overline{u}}_{m,NT}^{\prime}\overline{u}_{m,NT}}{NT}\right)=\frac{\sigma_{mm}tr(W_{NT}^{\prime }W_{NT})}{NT}, \\ E\left(\frac{\varepsilon_{m,NT}^{\prime}\overline{\varepsilon }_{m,NT}}{NT}\right) &=&E\left(\frac{u_{m,NT}^{\prime }\overline{u}_{m,NT}}{NT}+{\rho_{m}^{2}}\frac{\overline{u}_{m,NT}^{\prime }\overline{\overline{u}}_{m,NT}}{NT}-2\rho_{m}\left( \frac{\overline{u}_{m,NT}^{\prime}\overline{\overline{u}}_{m,NT}}{NT}+\frac{ \overline{u}_{m,NT}^{\prime }\overline{u}_{m,NT}}{NT}\right)\right)=0, \end{array} $$

where \(\overline {u}_{m,NT}=W_{NT}u_{m,NT}, \overline {\overline {u}}_{m,NT}=W_{NT}^{2}u_{m,NT}\), and \(\overline {\varepsilon }_{m,NT}=W_{NT} \varepsilon _{m,NT}\). Now denote

$$\begin{array}{@{}rcl@{}} \alpha_{m} &=&[\rho_{m},{\rho_{m}^{2}},\sigma_{mm}]^{\prime }, \\ g_{m,NT} &=&\frac{1}{NT}[\widetilde{u}_{m,2sl}^{\prime}\widetilde{u}_{m,2sl},\overline{\widetilde{u}}_{m,2sl}^{\prime }\overline{\widetilde{u}}_{m,2sl},\widetilde{u}_{m,2sl}^{\prime }\overline{\widetilde{u}}_{m,2sl}], \\ G_{m,NT} &=&\left[ \begin{array}{ccc} 2\widetilde{u}_{m,2sl}^{\prime }\overline{\widetilde{u}}_{m,2sl} & - \overline{\widetilde{u}}_{m,2sl}^{\prime }\overline{\widetilde{u}}_{m,2sl} & NT \\ 2\overline{\overline{\widetilde{u}}}_{m,2sl}^{\prime }\overline{\widetilde{u} }_{m,2sl} & -\overline{\overline{\widetilde{u}}}_{m,2sl}^{\prime }\overline{ \overline{\widetilde{u}}}_{m,2sl} & tr(W_{NT}^{\prime }W_{NT}) \\ \widetilde{u}_{m,2sl}^{\prime }\overline{\overline{\widetilde{u}}}_{m,2sl}+ \overline{\widetilde{u}}_{m,2sl}^{\prime }\overline{\widetilde{u}}_{m,2sl} & -\overline{\widetilde{u}}_{m,2sl}^{\prime }\overline{\overline{\widetilde{u}} }_{m,2sl} & 0 \end{array} \right] , \end{array} $$

where \(\overline {\widetilde {u}}_{m,2sl}\), and \(\overline {\overline { \widetilde {u}}}_{m,2sl}\) are the values of the 2SLS residuals transformed in the same way as \(\overline {u}_{m,NT}\) and \(\overline {\overline {u}}_{m,NT}\). The generalized method of moments estimator of (ρ m ,σ m m ) is defined as

$$(\widehat{\rho }_{m,gmm},\widetilde{\sigma }_{mm,gmm})=\arg \min(g_{m,NT}-G_{m,NT}\alpha_{m})^{\prime}(g_{m,NT}-G_{m,NT}\alpha_{m}). $$

A.1.3 Step 3 - GS3SLS

The estimated ρ m s from step 2 are used to perform a Cochrane-Orcutt-type transformation to account for the spatial dependence in the disturbances. Let

$$\begin{array}{@{}rcl@{}} y_{m,NT}^{\ast } =y_{m,NT}-\widehat{\rho }_{m,gmm}W_{NT}y_{m,NT}, \\ Z_{m,NT}^{\ast } =Z_{m,NT}-\widehat{\rho}_{m,gmm}W_{NT}Z_{m,NT}, \end{array} $$

Then the updated 2SLS estimate of the transformed equation

$$ y_{m,NT}^{\ast }=Z_{m,NT}^{\ast }\delta_{m}+\varepsilon_{m,NT}, $$

can be obtained as

$$\widehat{\delta }_{m,2sls}^{\ast }=\left(\widehat{Z}_{m,NT}^{\ast \prime}Z_{m,NT}^{\ast }\right)^{-1}\widehat{Z}_{m,NT}^{\ast \prime }y_{m,NT}^{\ast }, $$

where \(\widehat {Z}_{m,NT}^{\ast }=P_{H}Z_{m,NT}^{\ast }\) and \( P_{H}=H(H^{\prime }H)^{-1}H^{\prime }\). To consistently estimate Σ, let

$$\widehat{\varepsilon }_{m,NT}=y_{m,NT}^{\ast }-Z_{m,NT}^{\ast }\widehat{\delta }_{m,2sls}, $$

and then compute

$$ \widehat{\sigma }_{ml}=\frac{1}{NT}\widehat{\varepsilon }_{m,NT}^{\prime } \widehat{\varepsilon }_{l,NT},\text{ for }m,l=1,2,3, $$

\(\widehat {\Sigma }\) is then comprised of the elements of Eq. 32. We now stack the equations in Eq. 31 over m=1,2,3 as

$$y_{NT}^{\ast }=Z_{NT}^{\ast }\delta +\varepsilon_{NT}, $$

where \(y_{NT}^{\ast }=(y_{1,NT}^{\ast \prime },y_{2,NT}^{\ast \prime },y_{3,NT}^{\ast \prime })^{\prime }, Z_{NT}^{\ast }=\left [\begin {array}{ccc}Z_{1,NT}^{\ast } & 0 & 0 \\0 & Z_{2,NT}^{\ast } & 0 \\0 & 0 & Z_{3,NT}^{\ast } \end {array}\right ]\), and \(\delta =(\delta _{1}^{\prime },\delta _{2}^{\prime },\delta _{3}^{\prime })^{\prime }\). Because \(E\varepsilon _{NT}\varepsilon _{NT}^{\prime }={\Sigma } \otimes I_{NT},\) the GS3SLS estimator of δ is then

$$\widehat{\delta }_{3sls}=[\widehat{Z}_{NT}^{\ast \prime }(\widehat{\Sigma }^{-1}\otimes I_{NT})Z_{NT}^{\ast }]^{-1}\widehat{Z}_{NT}^{\ast \prime }(\widehat{\Sigma }^{-1}\otimes I_{NT})y_{NT}^{\ast }, $$

where \(\widehat {Z}_{NT}^{\ast }=(I_{3}\otimes P_{H})Z_{NT}^{\ast }=\left [\begin {array}{ccc}P_{H}Z_{1,NT}^{\ast } & 0 & 0 \\0 & P_{H}Z_{2,NT}^{\ast } & 0 \\0 & 0 & P_{H}Z_{3,NT}^{\ast } \end {array}\right ] \). Small sample inference concerning \(\widehat {\delta }_{3sls}\) can be based on the following small sample approximation

$$ \widehat{\delta }_{3sls}\symbol{126}N\left(\delta,\left[\widehat{Z}_{NT}^{\ast \prime}\left(\widehat{\Sigma }^{-1}\otimes I_{NT}\right)\widehat{Z}_{NT}^{\ast }\right]^{-1}\right). $$

A.2 Lee’s Ideal Instruments

Lee (2003) pointed out that although the IV matrix, H, generates consistent estimates, it is not the “best” set of instruments in the sense of efficiency. To achieve the asymptotic efficiency, he suggested using the fitted value of \(E(Z_{m,NT}^{\ast })\) in Eq. 31 as the new instruments to update the GS2SLS estimate in the third step. Because \( E(Z_{m,NT}^{\ast })\) is a function of δ m and ρ m , a series of best instrumental variable (IV) matrices proposed in Lee (2003) utilized the GS2SLS estimate of δ m and ρ m from the first two steps to fit \(E(Z_{m,NT}^{\ast })\). Lee (2003)’s optimal IV matrices were designed for a single spatial equation model without considering correlations across multiple spatial equations.Footnote 50 Therefore, we cannot directly use the expression of his proposed IVs. Instead, we modify Lee’s approach to construct “ideal instruments” for our SESSAR model. \(E(Z_{NT}^{\ast })\) can be written as

$$ E(Z_{NT}^{\ast })\,=\,\left[\! \begin{array}{ccc} (I_{NT}-\rho_{1}W_{NT})E(Z_{1,NT}) & 0 & 0 \\ 0 & (I_{NT}-\rho_{2}W_{NT})E(Z_{2,NT}) & 0 \\ 0 & 0 & (I_{NT}-\rho_{3}W_{NT})E(Z_{3,NT}) \end{array} \!\right]\!\! , $$


$$\begin{array}{@{}rcl@{}} E(Z_{1,NT}) =\left[ \begin{array}{cccc} W_{NT}E(y_{1,NT}) & E(y_{2,NT}) & E(y_{3,NT}) & X_{1,NT} \end{array} \right] , \\ E(Z_{2,NT}) =\left[ \begin{array}{cccc} W_{NT}E(y_{2,NT}) & E(y_{1,NT}) & E(y_{3,NT}) & X_{2,NT} \end{array} \right] , \\ E(Z_{3,NT}) =\left[ \begin{array}{cccc} W_{NT}E(y_{3,NT}) & E(y_{1,NT}) & E(y_{2,NT}) & X_{3,NT} \end{array} \right] . \end{array} $$

Recalling that E(ε N T )=0, it follows from Eq. 19 that

$$ E(y_{NT})=\left( I_{3NT}-B_{NT}\right)^{-1}{\Gamma}_{NT}x_{NT}, $$

or, alternatively collapsing the parameters of Γ N T in a vector form, we have

$$ E(y_{NT})=\left( I_{3NT}-B_{NT}\right)^{-1}diag(X_{NT})\gamma , $$

where \(diag(X_{NT})=\left [\begin {array}{ccc}X_{1,NT} & 0 & 0 \\0 & X_{2,NT} & 0 \\0 & 0 & X_{3,NT} \end {array}\right ]\) and \(\gamma =\left [\begin {array}{c}\gamma _{1} \\\gamma _{2} \\\gamma _{3} \end {array}\right ] \). As a function of λs, βs, and γs, E(y N T ) can be fitted using some initial estimates. This fitted value can then be decomposed into \(\widehat {E(y_{1,NT})}, \widehat {E(y_{2,NT})}\), and \(\widehat {E(y_{3,NT})}\). After plugging these fitted values back into Eq. 35, we have fitted values of E(Z m,N T )s. Combining \( \widehat {E(Z_{m,NT})}\)s and some initial estimates of ρs in Eq. 34, the updated instrumental variables are

$$H^{\ast }=\left[ \begin{array}{ccc} (I_{NT}-\widehat{\rho }_{1}W_{NT})\widehat{E(Z_{1,NT})} & 0 & 0 \\ 0 & (I_{NT}-\widehat{\rho }_{2}W_{NT})\widehat{E(Z_{2,NT})} & 0 \\ 0 & 0 & (I_{NT}-\widehat{\rho }_{3}W_{NT})\widehat{E(Z_{3,NT})} \end{array} \right] . $$

Along the lines of a few other best IVs suggested in Lee (2003), we also tested an alternative optimal IV matrix as

$$H_{a}=\left[ \begin{array}{ccc} (I_{NT}-\widehat{\rho }_{1}W_{NT})\widehat{E(Z_{1,NT})}_{a} & 0 & 0 \\ 0 & (I_{NT}-\widehat{\rho }_{2}W_{NT})\widehat{E(Z_{2,NT})}_{a} & 0 \\ 0 & 0 & (I_{NT}-\widehat{\rho }_{3}W_{NT})\widehat{E(Z_{3,NT})}_{a} \end{array} \right] , $$

where the \(\widehat {E(Z_{m,NT})}_{a}\)s have the same functional form as in Eq. 35, but the components of \(\widehat {E(y_{m,NT})}\)s in Eq. 35 are derived as

$$ \widehat{E(y_{NT})}=\left( I_{3NT}-\widehat{B}_{NT}\right)^{-1}diag(X_{NT}). $$

Regarding the estimation of our SESSAR model, the optimal IV approach follows the original GS3SLS routine with the only exception of replacing H with H (or H a ) in the last stage after all three equations are stacked together. The initial estimates of \(\widehat {B}_{NT}\) and \(\widehat {\gamma }\) are constructed in the first two stages of the GS3SLS procedure.

Appendix B: Modified Spatial J Test

The spatial J test focuses on the estimated value of α m . If the null model is correctly specified and the alternative model does not have explanatory power, the estimated value of α m should not be significantly different from zero. To derive a consistent and efficient estimate of α m and correctly specify its asymptotic distribution for our spatial simultaneous equation system, we modify the spatial J test procedure described in Kelejian and Piras (2009) by incorporating the GS3SLS technique. Specifically, we adopt the GS3SLS method for the null model of Eq. 16 and the alternative model of Eq. 23 respectively, and then use the Feasible Generalized Least Squares (FGLS) method. The exact steps used to construct the test are described below.

Step 1::

Estimate the parameters, δ m and ρ m of the null model, and \({\delta _{m}^{a}}\) of the alternative model using the GS3SLS method. Denote the estimated coefficients as \(\widehat {\delta }=(\widehat {\delta }_{1}^{\prime },\widehat {\delta }_{2}^{\prime },\widehat { \delta }_{3}^{\prime })^{\prime }\), \(\widehat {\rho }= (\widehat {\rho }_{1}, \widehat {\rho }_{2},\widehat {\rho }_{3})\), and \(\widehat {\delta }^{a}=(\widehat {\delta }_{1}^{a\prime },\widehat {\delta }_{2}^{a\prime },\widehat { \delta }_{3}^{a\prime })^{\prime }\).

Step 2::

Use the results in Step 1 to compute the estimates of \( y_{m,NT}^{+}, Z_{m,NT}^{+}\), and \(Z_{m,NT}^{a+}{\delta _{m}^{a}},\) which are written as

$$\begin{array}{@{}rcl@{}} y_{m,NT}^{+} &=&(I_{NT}-\widehat{\rho }_{m}W_{NT})y_{m,NT}, \\ Z_{m,NT}^{+} &=&(I_{NT}-\widehat{\rho}_{m}W_{NT})Z_{m,NT},\\ Z_{m,NT}^{a+}\widehat{\delta }_{m}^{a} &=&(I_{NT}-\widehat{\rho }_{m}W_{NT})Z_{m,NT}^{a}\widehat{\delta }_{m}^{a}. \end{array} $$
Step 3::

Let \(F_{m,NT}^{+}=[Z_{m,NT}^{+},Z_{m,NT}^{a+}\widehat { \delta }_{m}^{a}]\), and \(\phi _{m}^{\prime }=[\delta _{m}^{\prime },\alpha _{m}]\). Equation 26 can be simplified as

$$ y_{m,NT}^{+}=F_{m,NT}^{+}\phi_{m}+\varepsilon_{m,NT}^{+}, $$

where the parameter vector ϕ m can be estimated using a 2SLS estimator. Now define

$$\widetilde{H} =(X_{NT},W_{NT}X_{NT},W_{NT}^{2}X_{NT},M_{NT}X_{NT},M_{NT}^{2}X_{NT}), $$

and let \(\widetilde {P}_{H}=\widetilde {H}(\widetilde {H}^{\prime }\widetilde {H})^{-1}\widetilde {H}^{\prime }\), and \(\widehat {F}_{m,NT}^{+}=\widetilde {P}_{H}F_{m,NT}^{+}\). Then the 2SLS estimator has the following form

$$\widehat{\phi }_{m,2sls}=\left(\widehat{F}_{m,NT}^{+\prime }F_{m,NT}^{+}\right)^{-1} \widehat{F}_{m,NT}^{+\prime }y_{m,NT}^{+}. $$

To consistently estimate Σ+, let

$$\widehat{\varepsilon }_{m,NT}^{+}=y_{m,NT}^{+}-F_{m,NT}^{+}\widehat{\phi }_{m,2sls}, $$

and then compute

$$\begin{array}{@{}rcl@{}} \widehat{\sigma }_{ml}^{+} &=&\frac{1}{NT}\widehat{\varepsilon }_{m,NT}^{+\prime }\widehat{\varepsilon }_{l,NT}^{+},\\ m,l &=&1,2,3. \end{array} $$

Σ+ is then comprised of the \(\widehat {\sigma }_{ml}^{+}\) elements +.

Step 4::

Stack the equations in Eq. 39 over m=1,2,3 as

$$y_{NT}^{+}=F_{NT}^{+}\phi +\varepsilon_{NT}^{+}, $$

where \(y_{NT}^{+}=(y_{1,NT}^{+\prime },y_{2,NT}^{+\prime },y_{3,NT}^{+\prime })^{\prime }, F_{NT}^{+}=\left [ \begin {array}{ccc}F_{1,NT}^{+} & 0 & 0 \\0 & F_{2,NT}^{+} & 0 \\0 & 0 & F_{3,NT}^{+} \end {array}\right ]\), and \(\phi =(\phi _{1}^{\prime },\phi _{2}^{\prime },\phi _{3}^{\prime })^{\prime }\). The variance covariance matrix of \(\varepsilon _{NT}^{+}\) is then

$$E\varepsilon_{NT}^{+}\varepsilon_{NT}^{+\prime }={\Sigma}^{+}\otimes I_{NT}. $$

The GS3SLS estimator of ϕ is then obtained as

$$\widehat{\phi }_{3sls}=[\widehat{F}_{NT}^{+\prime }\left.\left(\widehat{\Sigma }^{+}\right)^{-1}\otimes I_{NT}\right)F_{NT}^{+}]^{-1}\widehat{F}_{NT}^{+\prime }((\widehat{\Sigma }^{+})^{-1}\otimes I_{NT})y_{NT}^{+}, $$

where \(\widehat {F}_{NT}^{+}=\widetilde {P}_{H}F_{NT}^{+}=\left [\begin {array}{ccc}\widetilde {P}_{H}F_{1,NT}^{+} & 0 & 0 \\0 & \widetilde {P}_{H}F_{2,NT}^{+} & 0 \\0 & 0 & \widetilde {P}_{H}F_{3,NT}^{+} \end {array}\right ]\), and \(\widehat {\Sigma }^{+} =\left [\begin {array}{ccc}\widehat {\sigma }_{11}^{+} & \widehat {\sigma }_{12}^{+} & \widehat {\sigma }_{13}^{+} \\ \widehat {\sigma }_{21}^{+} & \widehat {\sigma }_{22}^{+} & \widehat {\sigma }_{23}^{+} \\ \widehat {\sigma }_{31}^{+} & \widehat {\sigma }_{32}^{+} & \widehat {\sigma }_{23}^{+} \end {array}\right ] \). The small sample distribution of \(\widehat {\phi }_{3sls}\) is approximated as follows

$$\widehat{\phi }_{3sls}\symbol{126}N\left(\gamma ,\left[\widehat{F}_{NT}^{+\prime }\left(\left(\widehat{\Sigma }^{+}\right)^{-1}\otimes I_{NT}\right)F_{NT}^{+}\right]^{-1}\right). $$
Step 5::

Let \(R=\left [\begin {array}{cccccc}0_{k_{1}+3} & 1 & 0_{k_{2}+3} & 1 & 0_{k_{3}+3} & 1 \end {array}\right ]_{1\times (k_{1}+k_{2}+k_{3}+12)},\) where k 1 is the number of exogenous variables in X 1,N T , k 2 is the number of exogenous variables in X 2,N T , and k 3 is the number of exogenous variables in X 3,N T . Then the null model H 0 suggests that R ϕ=0. The Wald test statistic for this hypothesis can be written as

$$ \left(R\widehat{\phi }\right)^{\prime }\left\{R\left[\widehat{F}_{NT}^{+\prime }\left(\left(\widehat{\Sigma }^{+}\right)^{-1}\otimes I_{NT}\right)F_{NT}^{+}\right]^{-1}R^{\prime }\right\}^{-1}(R\widehat{\phi } ), $$

which is asymptotically distributed as χ 2 with three degrees of freedom. The null model H 0 is rejected if the test statistic is greater than the critical value of χ 2(3) at a given significance level.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kiefer, H., Kiefer, L. & Mayock, T. Should we Fear the Shadow? House Prices, Shadow Inventory, and the Nascent Housing Recovery. J Real Estate Finan Econ 52, 272–321 (2016).

Download citation

  • Published:

  • Issue Date:

  • DOI: