U.S. House Prices by Census Division: Persistence, Trends and Structural Breaks

This paper uses fractional integration methods to examine persistence, trends and structural breaks in United States house prices, more specifically the monthly Federal Housing Finance Agency House Price Index for census divisions, and the United States as a whole over the period from January 1991 to August 2022. The full sample estimates imply that the order of integration of the series is above one in all cases, and is particularly high for the aggregate series, implying high levels of persistence. However, when the possibility of structural breaks is taken into account, segmented trends are detected. The subsample estimates of the fractional differencing parameter tend to be lower, with mean reversion occurring in a number of cases. This means that shocks in the series are expected to be transitory in these subsamples, disappearing in the long run by themselves. In addition, the time trend coefficient is at its highest in the last subsample, which in most cases starts around May 2020 coincident with the beginning of the coronavirus pandemic. The results provide clear evidence of differences between census divisions, which implies that appropriate housing policies should be designed at the local (rather than at the federal) level.


Introduction
of different areas of a country (the U.S., in the current case) that are not captured by the aggregate price series. Thus, different policy prescriptions might be appropriate in each case. The other issue addressed by the current analysis is the possible presence of breaks in the series under examination, which is also of key importance to understand changes in the housing market that might have occurred as a result of a variety of factors (fundamentals or others), again with implications for the design of effective stabilisation policies.

Data Description and Modelling Framework
Monthly Federal Housing Finance Agency (FHFA) House Price Index data for census divisions and the U.S. as a whole were analyzed. The census divisions are East North Central, East South Central, Middle Atlantic, Mountain, New England, Pacific, South Atlantic, West North Central, West South Central. The sample period is January 1991 to August 2022. The series are not seasonally adjusted and were obtained from the House Price Index Datasets of the Federal Housing Finance Agency (2023).
The model is specified as follows: where y t stands for the series of interest, α and β denote the constant and the coefficient on a linear time trend, respectively, B is the backshift operator, i.e., Bx t = x t-1 , and u t is a short-memory process which is integrated of order 0. Note that d is allowed to take any real value, including fractional ones. Thus, as already mentioned, the chosen framework encompasses a wide range of specifications, such as the classical trend stationary I(0) model if d = 0, the unit root case if d = 1, antipersistence if d < 0, and long memory if d is positive and has a fractional value. In the latter case, if 0 < d < 0.5, the series is still covariance stationary. If d < 1, mean reversion occurs.

Empirical Results
First, the values of d were estimated from the model given by Eq. (1) under the assumption that the error term, u t , is a white noise process. Following the standard literature on unit roots (e,g., Bhargava, 1986;Schmidt & Phillips, 1992), three specifications were considered, including respectively: (i) no deterministic terms, i.e. α = β = 0; (ii) a constant only, i.e. β = 0; and (iii) both a constant and a linear time trend, i.e. α ≠ 0 and β ≠ 0. The selected specification was chosen by looking at the t-values of the estimated coefficients for these deterministic terms. The results for each series are displayed in Table 1 (upper part for the original data and lower part for the log-transformed data). Starting with the original data (Table 1, upper level), it can be seen that the coefficient on the time trend is significant for six out of the nine census divisions examined (i.e., in all cases except Mountain, Pacific and South (1) Atlantic), but is insignificant for the U.S. aggregate data. The largest values on the time trend are for the West South Central (0.761) and East South Central (0.758).
The estimated values of d are significantly higher than one in all cases, ranging from 1.24 (East South Central) and 1.25 (West North Central) to 1.53 (Mountain) and 1.55 (Pacific). For the U.S. aggregate data, the time trend is insignificant and the order of integration is 1.70, much higher than for the individual census divisions, which is probably due to the aggregation effect on the degree of integration of the series (Granger, 1980;Robinson, 1978). Focussing now on the log-transformed data (Table 1, lower level), the time trend is now significantly positive in all cases except for the Pacific division and for the aggregate data. The estimates of d are slightly smaller than before (between 1.13 for East South Central and 1.45 for Pacific), and again higher for the aggregate series (d = 1.52), the unit root null hypothesis being The results considered so far might be biased owing to the strong assumption that the residuals are a white noise process. Thus, in what follows, autocorrelation is allowed for. In particular, rather than imposing a parametric autoregressive moving average (ARMA) model that would require specifying the correct AR and MA orders (which is not straightforward in the context of fractional integration, Beran et al., 1998) the non-parametric modelling approach of Bloomfield (1973) was applied, which is based on a spectral density function, whose log form approximates well that of AR structures. Table 2 reports the corresponding results for the original and log-transformed data in the upper and lower parts, respectively. The time trend is now insignificant in every single case, the intercept being the only deterministic term required in the model. As for d, its estimated values are again significantly Finally, given the monthly frequency of the data, a seasonal AR(1) process in the error term was allowed for. As reported in Table 3, the results are very similar to the previous ones obtained under the assumption of white noise errors. The time trend is not required for the Mountain, Pacific, and South Atlantic divisions or for the aggregate data (U.S.), and the degrees of integration are higher than one in all cases, their estimated values being larger for the original data compared to the log-transformed data.
The high degree of persistence implied by the estimated values of d reported in Tables 1, 2, and 3 might be the result of misspecification due to the presence of structural breaks that have not been taken into account. In fact, given the long timespan covered by the data, this is most likely to have occurred. Therefore, Bai and Perron (2003) tests for multiple breaks were conducted as well as the version proposed for the fractional integration case (Gil-Alana, 2008). The break dates detected by means of these two sets of tests were identical and are displayed in Table 4. Two breaks were found in the case of West South Central, three in the case of East North Central and West North Central, four in the majority of cases, and five in the Pacific case. Many of the series exhibited breaks around April to June 2007, namely just before the U.S. sub-prime mortgage crisis. Breaks were also exhibited in April to August 2011 immediately before Operation Twist when the Fed restructured its debt portfolio by selling short-term T-bills and buying long-term debt with the aim of flattening the yield curve and boosting the mortgage market as well as other forms of credit. In addition, breaks were exhibited in May-June 2020 following the end of the shortest U.S. recession on record (caused by the coronavirus pandemic).
Table 5a-j displays the estimated coefficients for each series and each subsample. In the case of the East North Central series, the estimates of d are now much smaller than when considering the whole sample. Thus, the unit root null hypothesis cannot be rejected in the first three subsamples (with the data ending in May 2020). Only for the last subsample (June 2020-August 2022) was the estimate of d found to be much higher than one. The time trend is positive in the first, second and especially in the last subsample, being negative for the time period between May 2006 and October 2011.
Concerning the East-South-Central series, the values of d are now even smaller. The unit root hypothesis is rejected in favour of mean reversion (d < 1) during the first, third and fourth subsamples. It cannot be rejected during the second and the last subsamples.   significantly positive, except the third one for the period starting in June 2007. Once again, the estimated time trend coefficient is significant and particularly high in the last subsample. Very similar results were obtained for New England, though now mean reversion (i.e., significant evidence of d smaller than one) was found for the third and four subsamples (December 2005-January 2012, February 2012-May 2020) and a negative trend for the third subsample (December 2005-January 2012). The positive trend coefficients are equal to 0.1538 for the first subsample; 1.4513 for the second subsample; 0.7330 for the fourth subsample, and 3.8984 for the final subsample starting in June 2020.
In the case of the Mountain series, the results are slightly different. Mean reversion was not found in any single case, and d is statistically higher than one in the second and last subsamples, in the latter case being insignificant. Five breaks were detected in the case of the Pacific series. Mean reversion did not occur in any subsample, and d was estimated to be much higher than one, especially in the last subsample. The time trend is negative in the first subsample, positive in the second, third and fifth, and insignificant in the fourth and sixth.
Regarding the South Atlantic series, breaks were detected in January 1998, April 2007, July 2011, and May 2020. Mean reversion occurred in the fourth subsample (from August 2011 to May 2022) and the time trend was insignificant in the last subsample.
In the case of the West North Central series, mean reversion took place in the second (July 2007-April 2011) and third (May 2011-May 2020) subsamples, with a significant negative trend in the former. There were only two breaks (July 2011 and In brackets are the 95% confidence bands in column 2, and the t-statistics for the estimated coefficients in columns 3 and 4. Convergence was not achieved in the case of the third sub-sample, probably due to the small number of observations. In all cases, the p-values were <0.005. The starting date is January 1991 and the final date is August 2022. The series were obtained from the House Price Index Datasets of the Federal Housing Finance Agency (2023) June 2020) in the West-South-Central series. Mean reversion occurred in the second subsample, and the time trend is significantly positive in all three subsamples. Finally, there were four breaks in the U.S. aggregate series (January 1998, April 2007, August 2011and May 2020, and no mean reversion in any single case. The time trend coefficients are all positive, although convergence could not be achieved for the third subsample (May 2007-August 2011), probably as a result of the small number of observations. In the other cases, the time trend coefficient was significantly positive, again being particularly high in the last subsample.

Conclusions
This paper uses fractional integration methods to analyse the behaviour of U.S. house prices, more specifically the monthly Federal Housing Finance Agency (FHFA) House Price Index for census divisions and the U.S. as a whole, over the period from January 1991 to August 2022. The full sample estimates imply that the order of integration of the series is above one in all cases, and is particularly high for the aggregate series. However, when the possibility of structural breaks is taken into account, segmented trends are detected. The subsample estimates of the fractional differencing parameter tend to be lower, with mean reversion occurring in a number of cases, and the time trend coefficient being at its highest in the last subsample, which in most cases started around May 2020.
On the whole, it is clear that there is heterogeneity between housing markets in different geographical areas of the U.S., which might reflect differences in the number of buyers and sellers in each case as well as other local factors. These cannot be captured by the aggregate series. Thus, it is important to obtain evidence for the various census divisions as well. In particular, the individual series were found to be less persistent than the aggregate one, and also to be subject to structural change. The detected breaks appeared to correspond to well-known economic and policy developments (such as the sub-prime mortgage crisis, changes in the Fed's debt portfolio, and the rebound after the early stages of the coronavirus pandemic). House price persistence is transmitted to other macroeconomic and financial variables, Therefore, accurate information on persistence is crucial for policy decisions, with different policy measures needed in response to shocks depending on the degree of persistence (Himmelberg et al., 2005). The present study offers thorough evidence on this property for U.S. house prices in different geographical areas and time periods, and thus has important policy implications, in particular for crisis management and/ or prevention.
The present study has some limitations. In particular, the univariate nature of the chosen method does not allow for possible correlations between the various series. This would require estimating a panel model and is beyond the scope of the current analysis, which focuses instead on the stochastic properties of the individual series. Spatial econometrics would be an alternative approach, though totally unrelated to long-memory models. It would also be interesting to distinguish between prices for different types of houses, or to convert all of them to equivalent houses and then analyse the prices. This is left for subsequent studies. Future research could also allow for non-linearities in house prices. One possible approach would be based on Chebyshev's polynomials (Bierens, 1997), which do not produce abrupt changes in the series (unlike models with structural breaks), and can easily be used in the context of fractional integration. An alternative framework would include non-linear (deterministic) trends based on Fourier functions in time or neural networks, still within the long-memory framework.