Repeat Sales Index for Thin Markets
 First Online:
DOI: 10.1007/s1114600992031
 Cite this article as:
 Francke, M.K. J Real Estate Finan Econ (2010) 41: 24. doi:10.1007/s1114600992031
 18 Citations
 816 Downloads
Abstract
The repeat sales model is commonly used to construct reliable house price indices in absence of individual characteristics of the real estate. Several adaptations of the original model by Bailey et al. (J Am Stat Assoc 58:933–942, 1963) are proposed in literature. They all have in common using a dummy variable approach for measuring price indices. In order to reduce the impact of transaction price noise on the estimates of price indices, Goetzmann (J Real Estate Finance Econ 5:5–53, 1992) used a random walk with drift process for the log price levels instead of the dummy variable approach. The model that is proposed in this article can be interpreted as a generalization of the Goetzmann methodology. We replace the random walk with drift model by a structural time series model, in particular by a local linear trend model in which both the level and the drift parameter can vary over time. An additional variable—the reciprocal of the time between sales—is included in the repeat sales model to deal with the effect of the time between sales on the estimated returns. This approach is robust can be applied in thin markets where relatively few selling prices are available. Contrary to the dummy variable approach, the structural time series model enables prediction of the price level based on preceding and subsequent information, implying that even for particular time periods where no observations are available an estimate of the price level can be provided. Conditional on the variance parameters, an estimate of the price level can be obtained by applying regression in the general linear model with a prior for the price level, generated by the local linear trend model. The variance parameters can be estimated by maximum likelihood. The model is applied to several subsets of selling prices in the Netherlands. Results are compared to standard repeat sales models, including the Goetzmann model.
Keywords
House pricesKalman filterSignalextractionSmoothingStatespace modelsIntroduction
The value of the housing stock is a significant portion of the national wealth. The total value of the private real estate market in the Netherlands was approximately € 1,239 billion in 2007, corresponding to 436% of the real disposable household income in 2007, see CPB (2009). For that reason, many organizations and individuals, such as financial institutions or house owners, are interested in house price movements. A frequently used method to model house price movements is the repeat sales approach. For example, the House Price Index of the Kadaster (the Dutch Land Registry) is an application of the repeat sales model for the Netherlands, see Jansen et al. (2008).
Individual house characteristics, like house size, plot size, age, etc., can be omitted from the repeat sales model. This is one of the main advantages of the repeat sales model in absence of individual house characteristics. On a negative side, only selling prices of houses sold more than once can be used in a repeat sales model: all singlesales are not used in the estimation. A more general problem that the sold houses are a nonrandom selection of the entire housing stock (sample selection bias) is addressed by Gatzlaff and Haurin (1997) and Hwang and Quigley (2004), based on a procedure proposed by Heckman (1979). In this article, however, we do not address the problem of sample selection bias.
Implicit assumption in the repeat sales model is that the house characteristics and their impact on house prices do not change over time. This assumption does not obviously hold true for the age of the house in different selling years. In the repeat sales model the effect of age is embedded into the model’s estimates of the time effect, see Cannaday et al. (2005).
In this article we focus on the specification of the time effect in the repeat sales model by specifying it as a structural time series model. A structural time series model is a model in which the trend, seasonal and error terms, plus other relevant components, are modeled explicitly. In particular, we focus on the estimation of price trends in thin markets where the number of repeat sales is relatively low and, hence, the impact of transaction price noise on the estimation of the price trends is high.
Structural time series models have already been used in real estate applications. For example Schwann (1998) estimates a hedonic price index for a thin market, where the periodic returns follow a stationary autoregressive process. Francke and De Vos (2000) estimate hedonic price indices by a hierarchical trend model, in which different trends are simultaneously estimated for different market segments. Both models have the format (in logs): observed series = trend + regression effects + irregular, where some structure for the trend component is assumed. To our knowledge, structural time series models have not previously been used in order to estimate repeat sales price indices.
In the structural time series repeat sales model, the trend component is modeled explicitly using a local linear trend (LLT) model. The LLT model depends on two variance parameters, which are estimated by maximizing the appropriate likelihood function. Small values of these parameters result in smooth price indices. The approach of Goetzmann (1992) is a special case of the LLT repeat sales model. In this paper, we provide an alternative to his twostep estimation procedure.
In the next step, we examine the effect of the time between repeat sales on the estimation of the price level. Empirical evidence shows that large profits are made when the time between sales is relatively short, for example within first 6 months. This can be the result of “flipping houses”, that is buying and selling houses for profit in a short period of time or house improvements (about the latter, see Goetzmann and Spiegel 1995). A simple solution would be to drop these sales out of the sample. In this article, we include a variable containing the reciprocal of the time between repeat sales, as an alternative solution to dropping sales within first 6 months.
This article is structured as follows. Section “Repeat Sales Models and Small Samples” discusses existing methods for reducing the impact of transaction price noise on repeat sales price indices in small samples. Section “A Local Linear Trend Repeat Sales Model” describes the local linear trend repeat sales model, which also includes a term for the time between repeat sales, as an alternative solution to the approach given by Goetzmann (1992). The same section provides some background on structural time series models, its relation to the nonparametric methods, and references to real estate applications. Section “Estimation” explains the estimation approach of the LLT repeat sales model. Section “Application” begins with a description of the Kadaster database, containing all selling prices of houses in the Netherlands in the period from January 1993 to May 2009. It continues with a comparison of price indices from the LLT repeat sales model to indices based on the frequently used models of Case and Shiller (1987) and Goetzmann (1992). Price indices are compared for different subsets, varying from all residential selling prices in the Netherlands to a small subset in a specific area code. Finally, the impact of revision in repeat sales price indices is examined for both models. Section “Conclusions” concludes.
Repeat Sales Models and Small Samples
General Specification
Instead of simply excluding sales within a short time period or including a constant, an additional term \((t_is_i)^{1} \gamma_1\) is introduced in Eq. 3, which makes the model more flexible. For s_{i} close to t_{i} the term \((t_is_i)^{1}\) is large, and for s_{i} far from t_{i} the term \((t_is_i)^{1}\) approaches zero. Section “Application” provides empirical evidence that the average periodic return is a decreasing function of time between sales, which can be approximated well by the functional form \((t_is_i)^{1} \gamma_1\). Hence, large profits are made when this period is relatively short. Reasons for these high periodic returns are speculation and/or fixups. In the Netherlands there is an additional reason due to the transfer tax reduction for resales within 6 months, see Section “Application”. In our applications we explore several variants of Eq. 3, with and without γ_{0} and γ_{1}.
Small Samples
In the repeat sales models, the specification of the time effect is simply a dummy variable approach with fixed parameters β_{t}. Conditional on μ_{i} and α_{it}, the estimate of β_{t} is the average selling price at time t. This means that the estimate of β_{t} does not depend on preceding and subsequent periods. However, the estimate of β_{t} is sensitive to transaction price noise, in particular in small samples when the number of transactions per period is low. This happens, for example, with local price indices, short time periods, and/or in case of severe outliers, when the transaction price differs from its true market value by a large amount. The resulting price indices may then become very volatile.
In order to reduce the impact of transaction price noise on the estimate of β_{t}, different methods have been proposed. A first group of methods consists of a twostep procedure. In the first step, the log price indices (or periodic returns) are estimated from a version of the repeat sales given by Eqs. 2–3. Let \(\hat{\beta}_t\) denote the estimated log price indices. In the second step, these \(\hat{\beta}_t\) estimates are inputted into a smoothing algorithm like, for example, a locally weighted regression. Examples in the literature are provided by Cleveland (1979) and Wand and Jones (1995, Chapter 5), who introduce a more general, local polynomial kernel estimators. In comparison, in order to construct local house price indices, Clapp (2004) apply local polynomial regression in a spacetime model.
The main drawback of this twostep procedure is that it does not take into account the uncertainty in the estimates of β_{t}. Therefore, it disregards the precision of parameter \(\hat{\beta_t}\), and the covariance matrix, Cov(\(\hat{\beta_s},\hat{\beta_t}\)). The precision differs over time because the number of observations differs from one period to another, especially in small samples. Another concern is the behavior of the kernel at the boundaries, i.e. at the beginning and the end of the time period, because the kernel window at the boundaries is devoid of data.
A more recent approach in order to handle small datasets in repeat sales models is provided by Baroni et al. (2007). They propose a principal components analysis (PCA) factor repeat sales index, which exploits the relation between the house price indices and other economic and financial explanatory variables. One drawback of this approach is that the estimated price indices depend on the included set of explanatory variables.
A third way to manage a small number of observations is to replace dummy variables by a smooth (continuous) deterministic trend function, for example, a cubic spline. A slightly more flexible and equally easy to implement is the Fourier form approach, which depends on only few parameters. For more details, see McMillen and Dombrow (2001) and McMillen and McDonald (2004). This method has also been applied in a hedonic price model literature, see for example Thorsnes and Reifel (2007).
The structural time series approach which is applied in this article can be interpreted as a generalization of the Goetzmann (1992) approach. Firstly, structural time series models allow for a more general model specifications of the prior than the Goetzmann’s approach. In the random walk with drift model specification, the a priori assumption is that the drift term (κ) is constant over time. However, in successive periods of appreciation and depreciation of the price levels, this assumption is not valid. A more appropriate specification would be to allow κ in Eq. 4 to change over time. An example of such a model is the local linear trend model, which is explored in more detail in Section “A Local Linear Trend Repeat Sales Model”.
The second generalization concerns the estimation of the signaltonoise ratio, given by q_{ζ}. In Goetzmann’s approach, the variances σ^{2} and \(q_\zeta \sigma^2\) are estimated in an initial step, which sometimes leads to biased estimates of the variances. The resulting signaltonoise ratio is plugged into the second step of the Bayesian procedure. However, as it is shown in Section “Estimation”, it is possible to compute the concentrated loglikelihood and to estimate the signaltonoise ratio parameters directly by maximization. This can be applied in the Bayesian procedure as well as in the structural time series approach, avoiding the somewhat ad hoc two–step procedure.
In contrast to the dummy variable approach, the structural time series model enables the prediction of the price level based on preceding and subsequent information. This means that even for particular time periods where no observations are available, an estimate of the price level can be provided. The use of a structural time series model results in a more stable price index and (partly) reduces systematic downward revisions found in the repeat sales indices, see for example Clapp and Giaccotto (1999) and Clapham et al. (2006). Another advantage of these models is that price indices are provided even in continuous time models, avoiding the problem of temporal aggregation, for an example see Englund et al. (1999).
A Local Linear Trend Repeat Sales Model
Model Specification
The local linear trend repeat sales model is provided by Eqs. 2, 3, 5 and 6. The initial value of κ is an unknown parameter, say κ_{1}. Similar to the standard repeat sales model, we assume for the purpose of identification that β_{1} = 0.
In order to interpret β_{t} as the common trend we have to imply that the sum of the individual house trends is zero, i.e. \(\sum_{i=1}^{M}{\alpha}_{it}=0\) for t = 1,...,T. A simpler, equivalent approach is to define the common trend d_{t} as the sum of the common trend β_{t} and the average individual house trend α_{it}, such that \(d_t = \beta_t + \frac{1}{M}\sum_{i=1}^{M}{\alpha_{it}}\). If \(\sigma^2_d=\infty\), then \(\sum_{i=1}^{M}{\alpha}_{it}=0\) for all t. Note that for the standard repeat sales model, for which q_{ξ} = ∞ or q_{ζ} = ∞, it holds that \(\sum_{i=1}^{M}{\alpha}_{it}=0\). In practice the term \(M^{1}\sum_{i=1}^{M}{\alpha}_{it}\) is negligible.
Structural Time Series Models
In the statespace form, the unobserved components can be estimated by the Kalman filter algorithm. The Kalman filter also produces the likelihood function, which enables the estimation of the variance parameters. The Ox package SsfPack contains readytouse estimation procedures for estimating the statespace models. It can be downloaded for free in order to be used for academic research and teaching purposes, see Koopman et al. (1999). In this article for the reason of experience and practice, the statistical program Gauss is used for the estimation of the local linear trend repeat sales model.
Different models can be compared by likelihood based criteria;
Inference about the parameters, including the signaltonoise ratio, can be based on the likelihood;
Appropriate weights are implicitly provided. The weights depend on the position of the observations (begin, middle, or end of series) and magnitude of outlying observations. However, they are not necessarily symmetric.
Root mean square errors can be computed for the estimated trend;
The models can be made robust to outliers by specifying tdistributions;
By formulating a model in continuous time, the optimal weighting for irregularly spaced observations is automatically carried out.
Real Estate Applications of Structural Time Series Models
Structural time series models, or more generally state space models, have already been used in real estate applications. Schwann (1998) estimates a hedonic price index for a thin market using a Kalman filter, where the periodic returns \(\mathit{\Delta} \beta_t\) follow a stationary autoregressive process. Francke and De Vos (2000) estimate a hierarchical trend model, in which different trends are simultaneously estimated for different market segments. The trend specification is decomposed into a common trend, a region specific trend, and a house type specific trend. The region specific trend and the house type specific trend are modeled in deviation from the common trend. These models are efficiently estimated combining ordinary least squares and the Kalman filter, see also Francke and Vos (2004) and Francke (2008). Schulz and Werwartz (2004) provide a statespace model for house prices in Berlin. They include explanatory variables like inflation rates, mortgage rates, and building permissions in order to model the common price movement. Hannonen (2005, 2008) uses a structural time series model to analyze and predict urban land prices. To our knowledge, statespace models and the Kalman filter have not previously been used in order to estimate repeat sales price indices.
Estimation
A structural time series model can be put into a statespace format and efficiently estimated by the Kalman filter, see for example Durbin and Koopman (2001). In the local linear trend repeat sales model, the size of the state vector, which is the number of unknown parameters apart from the variances, becomes very large and is equal to M + T + 2, where M is the number of houses and T the number of time periods (for γ_{0} and γ_{1}). In the application provided in the next section the number of houses is approximately 500,000. Including all these variables in the state vector is not feasible, as it would require storage and inversion of 500,000 × 500,000 matrices.
As shown in the previous section an alternative to the repeat sales model Eq. 1 is the specification in ‘first differences’ (3), canceling out the M levels μ_{i}. Unfortunately, model (2), (3), (5), and (6) cannot (easily) be put into the statespace format, because the data depend on the difference of the state vector in two moments in time, with varying time spans. The statespace approach assumes that the state vector is a Markov chain. Therefore we have to rely on another estimation procedure.
One option is to use the Expectation Maximization (EM) algorithm for the model in levels as given in Eq. 1. Conditional on μ_{i}, the Kalman filter can straightforwardly be applied to estimate the log price index β_{t} and other parameters. In an additional step, the parameters μ_{i} can be estimated by means of the EM algorithm. This results in a recursive estimation procedure, where it is guaranteed that the algorithm converges to at least a local optimum, see Dempster et al. (1977). The EM algorithm is proposed by Shumway and Stoffer (1982) and Watson and Engle (1983). The main advantage of this approach is that more general time specifications including, for example, more complex trend specifications and seasonal components, can easily be dealt with.
In this article a different approach for the estimation of the local linear trend repeat sales model is put forward. The local linear trend repeat sales model ‘in differences’ is estimated by an empirical Bayesian procedure. The model can then be expressed as a linear regression model with a prior for β, induced by the local linear trend model (5)–(6). Conditional on the parameters ρ,q_{η},q_{ζ}, and q_{ξ}, the posteriors for β and σ easily follow. Estimates of ρ,q_{η},q_{ζ}, and q_{ξ} are obtained by maximizing the likelihood of the ‘differenced’ data.
 1.
Conditional on θ, an estimate of δ and σ^{2} is provided by Eqs. 14–17. The terms \(\sum_{i=1}^{M}{\widetilde{y}_i'\widetilde{\Omega}_i^{1}\widetilde{y}_i}\), \(\sum_{i=1}^{M}{\widetilde{y}_i'\widetilde{\Omega}_i^{1}\widetilde{Z}_i}\), \(\sum_{i=1}^{M}{\widetilde{Z}_i'\widetilde{\Omega}_i^{1}\widetilde{Z}_i}\), and \(\sum_{i=1}^{M}{\log \widetilde{\Omega}_i}\) can be computed per house observation. The precision matrix Ψ^{ − 1} follows from Eq. 12.
 2.
The parameters θ can be estimated by maximizing the likelihood function (18). All terms in the likelihood function are available from step 1.
 3.
Finally, the log price index and log return are given by \((t\!\!1)\kappa_1^{\ast} \!+\! \beta_t^{\ast}\) and \(\kappa_1^{\ast} \!+\! \beta_t^{\ast}\!\!\beta_{t1}^{\ast}\) respectively. The corresponding variances (and covariances) can be computed from Eqs. 14–15 straightforwardly. The price indices and returns are obtained by taking the antilog, and have a lognormal distribution.
Note that for q_{ζ} = q_{ξ} = ∞ in Eq. 12, the precision matrix Ψ = 0, hence it assumes no prior information. Therefore, conditional on θ_{1}, the estimation results coincide with standard repeat sales models. When q_{ξ} = 0 in Eq. 12, conditional on θ_{1} and q_{ζ}, the estimation results are equivalent to the approach of Goetzmann (1992).
The main difference between the Goetzmann’s approach and the local linear trend repeat sales model approach is the estimation of the parameters θ. For example, in Goetzmann’s approach, σ^{2} and \(q_{\zeta} \sigma^2\) are estimated in a two–step procedure. In the local linear trend repeat sales model, they are estimated in one step by maximum likelihood.
The slope parameters κ_{t} can be estimated in a similar fashion as the trend parameters β_{t}. The computation requires submatrices already computed in step 1. More details can be found in Appendix B.
Application
Data
sales between relatives;
transactions where the buyer is a legal entity;
if the same lot is sold more than once in one transaction;
no full ownership or long lease;
more than one purchase price in one transaction;
unlikely purchase price.
Overview of transactions in the period January 1993–May 2009
Description 


Total number of selling prices  3,481,390 
Total number of selling prices after screening  3,188,622 
Number of selling prices (at least two sales of the same house)  1,536,407 
Number of different houses  643,904 
Municipalities  443 
Zip code (4 digits)  3,804 
Number of sales
Number of sales  Number of observations 

2  455,503 
3  140,701 
4  37,428 
5  8,393 
6  1,584 
7  251 
8  38 
9  4 
10  2 
House types
House type  Number of observations 

Apartments  511,943 
Terraced houses and corner houses  737,754 
Semidetached houses  151,347 
Detached houses  135,363 
The database is used by the Kadaster to construct a monthly weighted repeat sales index, based on the method by Case and Shiller (1987). Indices are provided on a national level as well as on regional and house type levels. More details on the index construction method and the database can be found in Jansen et al. (2008).
In 2008 the weighted repeat sales is replaced by a monthly Sales Price Appraisal Ratio (SPAR) index. The index is published by the CBS (Statistics Netherlands) in cooperation with the Kadaster. The appraisal value is the WOZvalue (Waardering Onroerende Zaken), a yearly assessed value used for property tax. The WOZ law requires that the determined appraisal value is also used for other legal purposes, such as for the levy which the water boards can raise, and income taxes levied by the central government. The SPAR method has been applied in New Zealand since the early 1960s, see Bourassa et al. (2006). A more general treatment of assessed value price indices methods is provided by Clapp and Giaccotto (1992a, b). De Vries et al. (2007) provides a detailed description of the application of the SPAR method for the Netherlands.
Comparison of Indices
 a.
all selling prices in the Netherlands (846,439 observations);
 b.
the selling prices of semidetached houses (70,471 observations);
 c.
the selling prices of a small city of Maarssen (2,234 observations);
 d.
the selling prices in a specific area code (991 observations).
Estimation results from repeat sales models for different subsets
 Case Shiller  Goetzmann  Random walk with drift  Local linear trend 

The Netherlands  
12κ_{1}  0.076 (18.95)  0.076 (18.47)  0.067 (4.08)  
σ  0.075  0.075  0.075  0.075 
\(\sqrt{q_{\eta}}\sigma \)  0.015  0.015  0.015  0.015 
\(\sqrt{q_{\zeta}}\sigma\)  0.005  0.005  0.000  
\(\sqrt{q_{\xi}}\sigma\)  0.001  
st. dev. \(\mathit{\Delta} \hat{\beta_t}\)  0.0047  0.0043  0.0043  0.0039 
Loglikelihood  389,010.0  389,766.3  389,766.3  389,854.7 
N − M  846,439  
SemiDetached  
12κ_{1}  0.077 (9.73)  0.077 (12.98)  0.166 (7.07)  
σ  0.076  0.076  0.076  0.076 
\(\sqrt{q_{\eta}}\sigma \)  0.015  0.015  0.015  0.015 
\(\sqrt{q_{\zeta}}\sigma\)  0.009  0.007  0.002  
\(\sqrt{q_{\xi}}\sigma\)  0.001  
st. dev. \(\mathit{\Delta} \hat{\beta_t}\)  0.0092  0.0057  0.0050  0.0041 
Loglikelihood  33,150.0  33,156.8  33,190.5  
N − M  80,162  
Maarssen  
12κ_{1}  0.066 (2.22)  0.068 (7.10)  0.094 (3.96)  
σ  0.045  0.045  0.044  0.044 
\(\sqrt{q_{\eta}}\sigma \)  0.011  0.011  0.011  0.011 
\(\sqrt{q_{\zeta}}\sigma\)  0.035  0.011  0.009  
\(\sqrt{q_{\xi}}\sigma\)  0.000  
st. dev. \(\mathit{\Delta} \hat{\beta_t}\)  0.0345  0.0164  0.0054  0.0046 
Loglikelihood  1,644.4  2,024.2  2,065.6  2,068.8 
N − M  2,511  
Zip code 3076  
12κ_{1}  0.085 (1.11)  0.081 (6.01)  0.126 (4.21)  
σ  0.063  0.063  0.071  0.072 
\(\sqrt{q_{\eta}}\sigma \)  0.014  0.014  0.013  0.013 
\(\sqrt{q_{\zeta}}\sigma\)  0.090  0.015  0.009  
\(\sqrt{q_{\xi}}\sigma\)  0.001  
st. dev. \(\mathit{\Delta} \hat{\beta_t}\)  0.0895  0.0429  0.0069  0.0043 
Loglikelihood  280.4  484.9  546.5  549.6 
N − M  991 
In comparison, in Fig. 3d, the differences between the zip code area indices are substantial. The LLT price index is smooth, and is virtually the same as the RWD price index, while the CS index is very volatile, due to the small number of observations. The standard deviations of \(\mathit{\Delta} \hat{\beta_t}\) for the CS, Goetzmann, RWD, and LLT models are 0.0895, 0.0429, 0.0069, and 0.0043, respectively.
For the semidetached houses the differences between the indices are not substantial, but the CS price index is slightly irregular, see Fig. 3b. As can be seen from Fig. 3c, the city level price indices show the same pattern as the zip code indices, however they are less extreme.
Note that in Fig. 3c, the RWD and LLT price index are above the CS and Goetzmann index, and in Fig. 3d it is vice versa. This results from the fact that the CS price index is much more sensitive to outliers than the RWD and LLT index, particularly at the begin of the period, where the log price index value is assumed to be zero. This also holds for the Goetzmann index, although to a lesser extent. In the CS repeat sales model the outliers are absorbed by the initial price levels.
Other results provided by Table 5 are as follows. The standard deviation σ is approximately 7.5%, except for the city Maarssen where the standard deviation is only 4.4%. Note that this is a standard deviation of an individual house transaction, hence it should not to be larger for smaller numbers of observations. The standard deviations for the individual house random walks, \(\sqrt{q_{\eta}}\sigma\), approximately 1.5%, are relatively constant over the models and analysed samples.
The estimated values of \(\sqrt{q_{\zeta}}\sigma\) are identical for the Goetzmann and RWD models in the national Dutch data set. For the zip code area data, the estimates are very different: 0.090 (Goetzmann) versus 0.015 (RWD). This is also reflected in the standard deviations of \(\mathit{\Delta} \hat{\beta_t}\): 0.0429 (Goetzmann) versus 0.0069 (RWD). Note that for the RWD and LLT models, the standard deviation of \(\mathit{\Delta} \hat{\beta_t}\) is not very sensitive to the sample size, whereas for the CS and Goetzmann model the standard deviation decreases with the number of observations: in the CS (Goetzmann) model it varies between 0.0047 (0.0043) for the Netherlands and 0.0895 (0.0429) for the area code level. The high standard deviations imply that the CS and Goetzmann model cannot be used to construct detailed price indices and returns. The monthly standard deviations in the CS and Goetzmann model are respectively more than 13 and 6 times as large as the average monthly returns, where the average yearly return is in the order of 0.08. For the RWD and the LLT models, these figures are more reasonable: the monthly standard deviation at the area code level is respectively 1.0 and 0.6 times the average monthly return.
In all samples, the local linear trend model has a higher loglikelihood at the cost of only one additional parameter q_{ξ}, as compared to the Goetzmann and the RWD models. The loglikelihood for the Goetzmann and the RWD models are identical for the national Dutch data set. At the city and area code levels, the RWD model has a substantially higher loglikelihood than the Goetzmann model: the twostep procedure results in suboptimal estimates of σ and q_{η}. The suboptimal estimates result in more volatile log price indices, in comparison to the RWD price indices, for which the maximum likelihood estimates of the parameters have been used.
The estimates of the slope parameters κ_{t} can also be used for tracking turning points. For the LLT model, a turning point can alternatively be defined as a change from a positive (negative) slope parameter to a negative (positive) value. Following this definition, the turning points for the Netherlands and the semidetached houses are September 2008.
Above examples suggest that in case of many observations the log price level estimates coincide for the four methods. When only a few observations per time period are available, the CS price index is extremely volatile and sensitive to transaction price noise, while the LLT price index remains stable. In general, the results from the RWD model are close to the LLT model. However, the LLT model has a better model fit, as measured by the loglikelihood. In case of many observations, the Goetzmann and the RWD approaches produce the same results. When the number of observations is small, the twostep Goetzmann estimation procedure leads to suboptimal estimates of the the variances σ^{2} and \(q_\zeta \sigma^2\). The resulting indices are more volatile than the RWD estimates. This is in line with the findings of Goetzmann (1992), who states that the twostage Bayes procedure leads to an overestimation of q_{ζ}.
Time Between Repeat Sales
Two different semidetached houses datasets are used in order to examine the inclusion of the nontemporal component γ_{0} and the time between sales component \((ts)^{1} \gamma_1\). In the first dataset, all sales within six months are excluded, while in the second dataset, only the sales within one month are excluded, resulting in 1,843 additional observations.
Based on the first dataset, two different LLT models are estimated: (a) without an intercept (γ_{0} = 0) and (b) with an intercept (\(\gamma_0\ne0\)). In both models the time between sales component is absent (γ_{1} = 0). Based on the second dataset, another two LLT models are estimated: (c) without an intercept (γ_{0} = 0), and (d) with an intercept (\(\gamma_0\ne0\)). In both models the time between sales component (\(\gamma_1\ne0\)) is included.
Estimation results from local linear trend repeat sales for semidetached houses
Coefficient  γ_{0} = γ_{1} = 0  \(\gamma_0 =0, \gamma_1 \ne 0\)  \(\gamma_0 \ne 0, \gamma_1 \ne 0\)  \(\gamma_0 \ne 0, \gamma_1 = 0\) 

γ_{0}  0.032 (28.44)  0.037 (37.58)  
γ_{1}  0.1871 (39.47)  0.097 (17.32)  
σ  0.076  0.076  0.074  0.071 
\(\sqrt{q_{\eta}}\sigma \)  0.015  0.015  0.016  0.016 
\(\sqrt{q_{\zeta}}\sigma\)  0.002  0.002  0.002  0.002 
\(\sqrt{q_{\xi}}\sigma\)  0.001  0.001  0.001  0.001 
N − M  80,162  82,005  82,005  80,162 
Loglikelihood  33,190.5  34,874.6  35,269.3  33,867.3 
It can be concluded that the inclusion of a constant term has a large downward impact on the estimated price indeces. These results are in accord with the findings of, for example, Shiller (1993) and Clapp and Giaccotto (1999). We conclude that a feasible alternative is to keep all sales in the dataset and explicitly model them, rather than delete all within–short–period repeat sales and, therefore, ignore information in the data.
Revision
Average and maximum revision effects in the repeat sales model for the city Maarssen
 CS  Goetzmann  RWD  LLT 

Average  0.0035  0.0024  0.0027  0.0021 
Maximum  0.0239  0.0195  0.0133  0.0101 
Prediction
To illustrate the meaning of the estimated standard deviations in Table 6, we provide an example on the dataset for the semidetached houses. We take the model in the final column of Table 6 (\(\gamma_0 \ne 0, \gamma_1=0\)) as a base model for the calculations. Let us assume that a house i is sold for €100,000 in May 2004. We want to answer the question what the value of this house will be in May 2007. To be more precise, we want to answer what is the expectation and the standard deviation of the value in 2007. The components that influence the value of the house are the price trend (β_{t} and κ_{t}), the individual house trend α_{i}, and the transaction price noise ε_{it}. The estimated price increase (β_{t} and κ_{t}) is 0.136 (14.5%). The random walk has zero expectation, hence the expectation of the value in May 2007 equals €114,500. The standard deviation consists of three independent parts, (1) the standard deviation of the measurement error (0.071), (2) the standard deviation of the random walk of an individual house (0.016), and (3) the standard deviation of the price movements between May 2003 and 2007. The last one can be calculated from Eq. 15 and equals 0.0038. The total standard deviation is \(\sqrt{2\times0.071^2+36\times0.016^2+0.0038^2}=0.139\), approximately 14.79%.
Conclusions
In this paper we estimate the local linear trend repeat sales model, as an alternative to repeat sales models in which the log price levels are fixed unknown parameters. For large samples, the differences between the different models are small. It does not matter whether a priori a structure is imposed (random walk with drift and local linear trend model) or not (Case and Shiller model); the estimation results do entirely depend on the data, and not on the a priori structure. However, the local linear trend repeat sales model can also be used to construct price indices in thin markets, with only a small number of repeat sales, and for short time intervals. The impact of transaction price noise on the estimation of the house price trends is considerably reduced using the local linear trend repeat sales model. As a result of the underlying trend model, the estimated price indices are stable.
The local linear trend repeat sales model can be interpreted as a modification of the Goetzmann (1992) approach. The ‘constant’ appreciation rate assumption (random walk with drift) is replaced by a more realistic ‘time varying’ appreciation rate (local linear trend model). A second modification is the estimation of the signaltonoise ratios by maximizing the concentrated likelihood function, thus avoiding the somewhat ad hoc twostep procedure that results in overestimation of the signaltonoise ratio, and hence in more volatile return series.
In the local linear trend and the random walk with drift repeat sales model, both estimated by maximum likelihood, the standard deviation of the estimated monthly returns is almost insensitive to the sample size: for the local linear trend model it varies between 0.0039 (for n = 846,439) and 0.0046 (n = 2,511). This is about 0.6–0.7 times the average monthly return. In the Case and Shiller and Goetzmann models, the standard deviation decreases with the number of observations: for the Goetzmann model it varies between 0.0043 and 0.0429 and for the Case and Shiller it varies between 0.0047 and 0.0895. In these models, the monthly standard deviation is 6 to 13 times as large as the average monthly return. This implies that the Case and Shiller and the twostage Bayes variant of the Goetzmann models cannot be used to construct reliable detailed price indices and returns.
In addition, the local linear trend repeat sales model allow us to examine the effect of the time between repeat sales on the estimation of the price level. Empirical evidence shows that large profits are made when the time between sales is relatively short, say within first 6 months. For that reason, a new variable is included in the repeat sales model, containing the reciprocal of the time between sales, providing a satisfactorily description of the empirical evidence.
The structural time series approach that is used in this article allows for more generalizations, such as the inclusion of seasonal effects and specifications of hierarchical trends (see Francke and De Vos 2000) or common factors for different market segments. The impact of outliers can also be reduced by assuming the transaction price noise to have a tdistribution. As part of future research, these generalizations can also be dealt with within the statespace framework.
Acknowledgements
I would like to thank two unknown referees, David Geltner, the participants of the MaastrichtMITNUS 2008 symposium, and Sunčica Vujić for helpful comments.
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.