Skip to main content

A spatial model averaging approach to measuring house prices


We present a novel approach to the estimation of hedonic imputation (HI) price indices for real estate markets using a new Mallows model averaging (MMA) estimator that is robust to spatial dependence. The spatial MMA (SMMA) method is explicitly designed to minimize the quadratic forecast loss of the imputed sales transactions that comprise the HI index when sales transactions are spatially correlated. We apply the SMMA HI approach to a sales transaction dataset for three geographic real estate suburbs of Auckland, New Zealand. The SMMA HI price indices outperform conventional OLS HI methods, exhibiting more accurate out-of-sample prediction and tighter in-sample confidence intervals. The SMMA HI method is expected to offer practitioners enhancements in constant-quality price index accuracy in data sparse environments where model overfitting is a concern, such as high frequency price measurement or highly localized geographies.


An accurate method for tracking price changes over time provides important information in any market. Price measurement is a relatively straightforward exercise when the market is characterized by frequently transacted homogeneous goods. However, this is rarely the case in real estate markets. Properties sell both infrequently and are inherently heterogeneous, differing markedly in characteristics such as location, floor area, lot size and building construction, presenting significant challenges to the accurate measurement of trends in property prices.

Hedonic methods are increasingly being employed to meet these challenges (Hill 2013). One of the primary strengths of a hedonic approach is that it can produce price indices that account for differences in the quality of houses transacted. Silver (2016) refers to hedonic methods as the “gold standard” in constant-quality house price measurement. In addition, hedonic imputation (HI) methods, which use fitted house prices from a hedonic regression as the basis for index construction, can overcome the “new goods problem” (Pakes 2003), which is likely to be particularly acute in real estate markets during periods of heightened residential development, rezoning and redevelopment.Footnote 1

Despite these strengths, hedonic methods come with their own unique challenges. In particular, they require the practitioner to specify an underlying auxiliary regression function that is used to control for differences in housing quality. Despite the fact that econometric theory is replete with theory and methods to assist practitioners to assist in model specification decisions, the problem of how to choose a model specification is often not discussed in applications of hedonic methods. For example, in their survey of hedonic studies, Sirmans et al. (2005, p. 5) conclude that “[T]he functional form of the model and the variables included in the model can often seem ad hoc”. Yet the model selection decision is of critical importance. A model that is too parsimonious is likely to suffer from misspecification bias, undermining the capacity of the regression to adequately control for differences in quality.Footnote 2 A model that is loo large runs the risk of including marginally relevant or altogether irrelevant attributes that similarly undermine model accuracy through overfitting. This trade-off between misspecification bias and overfitting variance is particularly acute in situations where there are few transactions to estimate the hedonic regression, which is likely to be the case for small geographies, such as suburbs or neighborhoods, or high frequency measures, such as monthly or weekly price indices.

This paper addresses this problem by proposing a new method for specifying hedonic regressions in the construction of hedonic imputation (HI) indices. Rather than select a single hedonic specification from a candidate set of models, we propose a model averaging solution, motivated by the substantive amount of research that shows that model averaging produces more accurate predictions than selecting a single model (Liang et al. 2011; Hansen 2014; Zhang et al. 2016). The model averaging method assigns a weight to each regression model within the candidate set of hedonic regression specifications. Because HI indices are based on predicted transaction prices, the model averaging weights are optimized to minimize the expected quadratic loss of the individual predictions. This yields a model averaging method that is analogous to Hansen’s 2007; 2008 Mallows Model Averaging (MMA) adjusted to account for spatial dependence between housing transactions using the Conley (1999) nonparametric spatial kernel. The resultant spatial MMA (SMMA) features a larger penalty on model dimensionality when there is positive spatial correlation between transaction prices. Although the SMMA method is employed in a real estate framework, its applicability is not limited to real estate, and could be used in any regression application that features spatial panels with weak dependence in the errors.

We showcase the performance of the SMMA HI method by constructing quarterly house price indices for three suburbs in the Auckland, New Zealand, real estate market. These sub-markets are each relatively sparse in terms of the number of quarterly sales transactions, averaging between 156 and 225 observations in each quarter. The bias-variance trade-off in the selection of a model size is likely to be particularly acute in smaller samples, making these suburbs ideal applications to examine the SMMA HI method. The results demonstrate that the SMMA HI approach can produce both tighter confidence intervals on price index growth estimates and superior out of sample price growth predictions over an “ad hoc” ordinary least squares (OLS) HI estimation method. The SMMA method also outperforms HI index construction in which explanatory variables are selected via the Bayesian Information Criterion (BIC).

Overall, this paper makes several key contributions to the literature. First, it proposes a new type of MMA estimator for use in linear regression models characterized by spatially correlated errors. This method is not limited to real estate applications: It can be applied more generally to spatial regression models provided there is a measure of distance between observations. Second, it tackles the often overlooked issue of hedonic model specification in price measurement by applying the spatial model averaging approach to determine model specification. Finally, it also contributes to a growing literature focused on small sample constant-quality price measurement in real estate markets (see Bokhari and Geltner 2012; Bollerslev et al. 2016; Bourassa and Hoesli 2017; Hill et al. 2017).

The remainder of the paper is structured as follows. Section two discusses the literatures of price measurement in real estate markets, model averaging, and spatial econometrics. Section three outlines the methodology used in the paper, detailing the Hansen (2007) MMA procedure, before extending the approach to account for spatial dependence and heterogeneity. The hedonic imputation price index method is also introduced. Section four presents the empirical application of the SMMA HI method, detailing the data utilized and the obtained results. This section also contains an evaluation of the performance of the SMMA HI index relative to feasible ad hoc and BIC-based alternatives. Finally, section five concludes.

Related literature

This paper is directly related to the literatures on price measurement and model averaging. Each are discussed in turn.

Many published constant quality house price indices rely used repeat sales to estimate price growth through the use of matched sales pairs (see the repeat sales (RS) index approaches of Bailey et al. 1963; Case and Shiller 1987; and Case et al. 1991). Hill (2013) notes that these type of approaches suffer from several issues which can cause bias in price measures. First, they can omit substantial amounts of data due to the requirement for two observed sales transactions. An outcome of this is that RS type indices omit information on newly constructed properties—more generally referred to as the “new goods problem” in price measurement (Pakes 2003). Second, the attributes of the property may also substantially change from one sale to the next. These characteristics are not restricted to inherent features of the property. They include, for example, land use regulations and other non-zoning regulatory policies that affect the redevelopment option value of the property (Clapp et al. 2012). Third, there is the “lemons” bias (Clapp and Giaccotto 1992), whereby the index is weighted towards frequently transacted properties. Last, there is the fact that the indices are subject to revision over time as new sales transactions are added to the dataset.Footnote 3 The Sales-Price to Appraisal Ratio (SPAR) approach can also be considered a special case of the RS method, wherein property sales prices are compared against their most recent government valuation estimate instead of a previous sales transaction price. See Bourassa et al. (2006) for further discussion of the method.

Such issues with extant RS type index approaches has led to the promotion of hedonic methods for constructing constant quality residential house price indices (see p. 159, Eurostat 2013), and statistical agencies are increasingly adopting hedonic methods for house price measurement (Silver 2016). Hill (2013) attributes the more recent application of hedonic methods in real estate to modern advances in both data collection and computation ability. In the case of Australasia, there now exists a substantial body of research utilizing hedonic index methods (for instance, see Bourassa et al. 2006; Hansen 2006; Hill and Melser 2008; Hill et al. 2009; Goh et al. 2012; Hill et al. 2017; Hill and Scholz 2017). Nonetheless, hedonic indices are also criticized on the basis of computational burden and irreproducibility. Goh et al. (2012) note that the approach is relatively computationally and data intensive, limiting its applicability where sales transaction data quality is poor. A second criticism is that hedonic approaches rely on the practitioner specifying an underlying functional form for the hedonic regression(s). Shiller (2008) argues that decisions about the specification of the hedonic function creates a lack of transparency and irreproducibility of indices. Hill (2013) discusses how this discretion can also lead to misspecification problems that bias results. However, as Malpezzi (2008) notes, despite the known risks misspecification presents, there is a surprising lack of hedonic measurement research utilizing formal specification tests. This paper can be viewed as a partial solution to problems stemming from opaque specification decisions by providing a data-dependent method for the selection of variables to include in the hedonic regression function.

There are two types of hedonic indices: the time-dummy approach and hedonic imputation (HI) method. Hill (2013) provides a detailed taxonomy of the two approaches. He notes that the time dummy approach represents the computationally simpler of the two and allows for straightforward estimation of index standard errors through the variance-covariance matrix of the OLS time-dummy regression. But this comes at the cost of flexibility as the approach enforces the strict assumption that the shadow prices of hedonic characteristics do not change over time. Additionally, the approach does not allow practitioners the ability to specify an index formula (Silver 2016). In contrast, the HI method allows hedonic coefficients to evolve over time by (re)estimating the hedonic function for each time period in a sample, and also offers the flexibility to a wider range of index formulas (see Triplett (2006) and Hill (2013) for discussions on index formula choice in hedonic imputation indexing). Silver and Heravi (2007) provide a comprehensive review of hedonic imputation and hedonic time dummy index approaches.

Model averaging based on an objective model selection criterion represents a key solution to potential misspecification issues in hedonic functions used in hedonic imputation indices. Model averaging represents an extension of the forecast combination literature established by Barnard (1963), Bates and Granger (1969), and Granger and Ramanathan (1984) who highlighted how superior forecast results could be obtained via the averaging of related competing forecast models. Following this, development of the model averaging approach was primarily pursued in the Bayesian model averaging (BMA) context (for reviews of model averaging in economics, see Claeskens and Hjort 2008; Moral-Benito 2015; Steel 2017). More recently, frequentist methods have been developed, such as the least squares model averaging of Hansen (2007, 2008, 2014). The approach relies on the use of a Mallows (1973) information criterion to estimate weights for a set of candidate models. Although the original Hansen (2007) paper was silent on the effects of heteroskedasticity and correlation on the functional form of the Mallows criterion, it has subsequently been further developed to account for heteroskedasticity (see Hansen and Racine 2012; Liu and Okui 2013; Liu et al. 2016). However, spatial correlation in regression errors is an omnipresent consideration in real estate price measurement (see Pace et al. 1998; Malpezzi 2008; Holly et al. 2010; Hill 2013; Helbich et al. 2014; Hill et al. 2017; Hill and Scholz 2017, and references within), calling for averaging methods to be developed that can account for both heteroskedasticity and spatial correlation.

Spatial methods for econometric analysis have been extensively studied within the econometrics and broader social science literatures since at least 1979 (see Anselin 2010, for a review). Specifically, spatial econometric approaches offer methods for addressing issues related to the non-random ordering of observations in geographic space (Anselin 1988). Elhorst (2010) notes that the three key methods that have been developed in the spatial econometrics literature to examine spatial interaction effects, including maximum likelihood (ML) estimation, generalized method of moments (GMM) estimation, and Bayesian Markov Chain Monte Carlo (MCMC) approaches. Furthermore, the literature has also been extended to specifically consider spatially robust model averaging estimators. However, the focus of this literature has been largely restricted to Bayesian Model Averaging (see LeSage and Parent 2007; Cotteleer et al. 2011), while spatial techniques using frequentist model averaging estimators have, so far, attracted little attention. Two recent exceptions are Zhang and Yu (2018) and Liao et al. (2019), who each develop parametric spatial autoregressive model averaging estimators. Zhang and Yu (2018) parameterize spatial correlation into the regression function, while Liao et al. (2019) consider a parametric spatial correlation structure in the error (but preclude heteroskedasticity). Our work complements this line of research by permitting non-parametric spatial dependence structures in the error.


In this section we introduce the spatially robust Mallows mdel averaging method and demonstrate its relevance to HI index construction. We begin by introducing the conventional Hansen (2007) Mallows model averaging. We then show how the basic idea can be extended to account for heteroskedasticity following the approach of Liu and Okui (2013) (henceforth “LO”), and then weak dependence in a spatial framework. We end by introducing the hedonic imputation method.

Mallows model averaging

Following Hansen (2007) we consider a random sample of \(\left\{ y_{i},x_{i}^{\prime }\right\} _{i=1}^{n}\), where \(y_{i}\) is a scalar and \(x_{i}=(x_{i,1},x_{i,2},...)^{\prime }\) is a countably infinite-dimensional vector of potential explanatory variables. The data generating process (DGP) is given by

$$\begin{aligned} y_{i}=\mu _{i}+e_{i} \end{aligned}$$

where \(\mu _{i}=\sum _{j=1}^{\infty }\theta _{j}x_{i,j}\), and \(E(e_{i}|x_{i})=0\). Now, consider a sequence of M candidate (approximating) models \(m=1,2,...,M\) where each model has \(k_{m}>0\) regressors drawn from \(x_{i}\). The mth approximating model of (1) can be written as

$$\begin{aligned} y_{i}=\sum _{j=1}^{k_{m}}\theta _{j(m)}x_{i,j(m)}+b_{i(m)}+e_{i} \end{aligned}$$

where \(x_{i,j(m)}\) for \(j=1,2,...,k_{m}\) denotes the regressors in the mth model, \(\theta _{j(m)}\) denotes the mth model coefficient, and \(b_{i(m)}=\mu _{i}-\sum _{j=1}^{k_{m}}\theta _{j(m)}x_{i,j(m)}\) is a model approximation error arising from the truncation of the infinite series \(x_{i}\). Thus, \(m=1,2,...\) indexes the set of candidate models under consideration for describing \(\left\{ y_{i}\right\} _{i=1}^{n}\) and \(k_{m}\) denotes the number of regressors in the mth model.

The original Hansen (2007) model restricts the regressors \(x_{i}\) to be an ordered set, with each candidate model m containing the first \(k_{m}\) regressors from \(x_{i}\). In other words, each model \(=1,2,...\) in the ordered set nests the previous candidate model, such that \(0<k_{1}<k_{2}<...<k_{M}\). Wan et al. (2010) and Hansen and Racine (2012) demonstrate that the MMA estimator remains optimal without the nesting restriction. Note that \(k_{m}\) is not restricted to be growing in increments of one. Instead, each new model can incorporate a set of new regressors. Hansen (2014) highlights that the ability to group regressors can be beneficial in cases such as when the model includes locational fixed effects.

In matrix notation the mth model can be expressed as \(Y = \mu _{m} + e\), where \(Y = (y_{1},...,y_{n})^{\prime }\), \(\mu _{m} = (\mu _{1(m)},...,\mu _{n(m)})^{\prime }\), and \(e = (e_{1},...,e_{n})^{\prime }\) . The projection matrix for each model m is defined as \(P_{m} = X_{m}^{\prime }(X_{m}^{\prime }X_{m})^{-1} X_{m}\), where \(X_{m}\) is an \(n \times k_{m}\) matrix of regressors with elements \(x_{i,j(m)}\), giving the OLS estimate of \(\mu\) for model m as \(\hat{\mu }_{m} = P_{m} Y\) and the residual as \(\hat{e}_{m} = Y - \hat{\mu }_{m}\).

Rather than choose a single model from the set of candidate models indexed by \(m=1,...,M\), under model averaging the goal is to assign a weight \(w_{m}\) to each candidate model and take a weighted average across models. The vectors of weights \(W=(w_{1},w_{2},...,w_{M})^{\prime }\) is restricted to the unit simplex in \({\mathbb{R}}^{M}\):

$$\begin{aligned} {\mathcal{H}}_{n}=\left\{ W\in [0,1]^{M}:\sum _{m=1}^{M}w_{m}=1\right\} \end{aligned}$$

The model averaging estimator for a given weight vector W is then

$$\begin{aligned} \hat{\mu }(W)=\sum _{m=1}^{M}w_{m}\mu _{m}=\sum _{m=1}^{M}w_{m}P_{m}Y=P(W)Y \end{aligned}$$

Note that P(W), the weighted projection matrix, does not necessarily represent the properties of a traditional OLS projection matrix, except in the case that \(w_{m}=1\) for an m.

The principle behind Mallows model averaging is to select a vector of weights, W, which minimizes the quadratic forecast risk of \(\hat{\mu } \left( W\right) ,\) i.e., \({\rm{E}}\left[ \left( \hat{\mu }\left( W\right) -Y\right) ^{\prime }\left( \hat{\mu }\left( W\right) -Y\right) \right]\). Hansen shows that under the assumption that the error terms are iid,  we have

$$\begin{aligned} {\rm{E}}\left[ \left( \hat{\mu }\left( W\right) -Y\right) ^{\prime }\left( \hat{\mu }\left( W\right) -Y\right) \right] ={\rm{E}}\left[ \left( \sum _{m=1}^{M}w_{m}\left( b_{i(m)}+e_{i}\right) \right) ^{2}\right] +\sigma ^{2}\sum _{m=1}^{M}w_{m}k_{m} \end{aligned}$$

which can be consistently estimated for the set of weights satisfying (3) using the Mallows criterion:

$$\begin{aligned} C_{n}^{Mallows}(W)=\hat{e}(W)^{\prime }\hat{e}(W)+2S^{2} \sum _{m=1}^{M}w_{m}k_{m}, \end{aligned}$$


$$\begin{aligned} \hat{e}(W)=Y-\hat{\mu }(W)=\sum _{m=1}^{M}w_{m}\hat{e}_{m} \end{aligned}$$

is a vector of residuals and \(S^{2}\) is an estimate of the variance of \(\left\{ e_{i}\right\} _{i=1}^{m}\).Footnote 4 Hansen suggests that the latter is best approximated by the unconditional variance of the errors of the \(M\)th candidate model; that is, \(E(e_{i(M)}^{2})\simeq S^{2}\) and so \(S^{2}=\frac{1}{n}\hat{e}(M)^{\prime } \hat{e}(M)\) in practice. Weights are selected by choosing W to minimize the estimated quadratic forecast loss \(C_{n}^{Mallows}(W)\) in (5):

$$\begin{aligned} \hat{W}=(\hat{w}_{1},\hat{w}_{2},...,\hat{w}_{m})=\arg \min _{W\in {\mathcal{H}} _{n}}C_{n}^{Mallows}(W) \end{aligned}$$

The weights are therefore the solution to a quadratic programming problem.

Note that the first term on the right-hand side of (5) is the in-sample quadratic loss of the averaged model. This would be minimized by setting \(w_{M}=1\), i.e., selecting the largest model in the candidate set. (In other words, the OLS estimator for candidate model M.) The second term is a penalty on the dimensionality of the averaged model. This would be minimized by setting \(w_{1}=1\), i.e., selecting the smallest model in the candidate set. Thus, MMA weights are selected to balance between misspecification (decreasing in \(k_{m}\)) as captured by the first term, and over-parameterization (increasing in \(k_{m}\)) as captured by the second term. This trade-off is sometimes referred to as the bias-variance trade-off. Parsimonious models are synonymous with higher estimator bias (i.e. misspecification) but lower prediction variance. Conversely, highly parameterized models have lesser estimator bias but higher levels of prediction variance.

Spatial MMA

The second term on the right-hand side of (4) is a model complexity term that accounts for the impact of estimator variance on quadratic forecast risk. It is a (weighted average of a) quadratic form of the covariance of the estimated model parameters. To see this, let \(\Omega := {\rm{E}}\left( ee^{\prime }\right)\) and note that \(\Omega =\) \(\sigma ^{2}I_{m}\) under the iid assumption on \(\left\{ e_{i}\right\} _{i=1}^{n}\). Then

$$\begin{aligned} \begin{aligned} {\rm{tr}}[\Omega P(W)]&=\sum _{m=1}^{M}w_{m}{\rm{tr}}[X_{m}^{\prime }\Omega X_{m}(X_{m}^{\prime }X_{m})^{-1}] \\&=\sigma ^{2}\sum _{m=1}^{M}w_{m}{\rm{tr}}[X_{m}^{\prime }X_{m}(X_{m}^{\prime }X_{m})^{-1}] \\&=\sigma ^{2}\sum _{m=1}^{M}w_{m}k_{m} \end{aligned} \end{aligned}$$

where recall that the covariance of the OLS estimator is \((X_{m}^{\prime }X_{m})^{-1}X_{m}^{\prime }\Omega X_{m}(X_{m}^{\prime }X_{m})^{-1}\). The analytic expression for the model complexity term will change when the iid assumption on the errors is relaxed to permit heterogeneity or dependence. In the case of heteroskedasticity (and independence), \(\Omega\) is a diagonal matrix with \(\left\{ \sigma _{i}^{2}\right\} _{i=1}^{n}\) along the principle diagonal. In this case

$$\begin{aligned} {\rm{tr}}[\Omega P(W)]=\sum _{m=1}^{M}w_{m}{\rm{tr}}\left[ \sum _{i=1}^{n}x_{i(m)}x_{i(m)}^{\prime }\sigma _{i}^{2}(X_{m}^{\prime }X_{m})^{-1}\right] \end{aligned}$$

and consequently the penalty on model dimensionality in (5) must also be adjusted to reflect heteroskedasticity. LO propose the penalty term

$$\begin{aligned} \tilde{T}^{LO}=2\sum _{m=1}^{M}w_{m}{\rm{tr}}\left[ \sum _{i=1}^{n}\hat{e} _{i(M)}^{2}x_{i(m)}x_{i(m)}^{\prime }(X_{m}^{\prime }X_{m})^{-1}\right] =2\sum _{i=1}^{n}\hat{e}_{i(M)}^{2}p_{ii}(W) \end{aligned}$$

in place of \(2S^{2}\sum _{m=1}^{M}w_{m}k_{m}\) in (5), where \(\hat{e}_{i\left( M\right) }\) is a residual from a preliminary estimation of a model including all regressors, and \(p_{ii}(W)\) is the ith diagonal element of P(W).

We extend the LO penalty term to permit spatial correlation. In this case \(\Omega\) is no longer diagonal. But if the correlation is weak and dies off at a suitably fast rate as the distance between observations increases, we can employ conventional kernel based methods to estimate the \(m\times m\) matrix \(\Phi _{m}:=X_{m}^{\prime }\Omega X_{m}\). We propose the following penalty function:

$$\begin{aligned} \tilde{T}^{SP}=\sum _{m=1}^{M}w_{m}{\rm{tr}}\left[ \hat{\Phi } _{m}^{SP}\left( \sum _{i=1}^{n}x_{i(m)}x_{i(m)}^{\prime }\right) ^{-1}\right] \end{aligned}$$

where \(\hat{\Phi }_{m}^{SP}\) is a Conley (1999) spatial heteroskedasticity and autocorrelation consistent (HAC) estimator:

$$\begin{aligned} \hat{\Phi }_{m}^{SP}=\sum _{i=1}^{n}x_{i(m)}x_{i(m)}^{\prime }\hat{e} _{i(M)}^{2}+\sum _{i=1}^{n}\sum _{j=i+1}^{n}K(d_{i,j})\hat{e}_{i(M)}\hat{e} _{j(M)}(x_{i(m)}x_{j(m)}^{\prime }+x_{j(m)}x_{i(m)}^{\prime }) \end{aligned}$$

Here \(K(d_{i,j})\) is a linear kernel function:

$$K(d_{{i,j}} ) = \left\{ {\begin{array}{*{20}l} {\left( {1 - \frac{{dist_{{i,j}}^{{PLN}} }}{{bw_{{dist}}^{{PLN}} }}} \right)} \hfill & {fordist_{{i,j}}^{{PLN}} < bw_{{dist}}^{{PLN}} } \hfill \\ 0 \hfill & {otherwise} \hfill \\ \end{array} } \right\}$$

where \(dist_{i,j}^{PLN}\) represents absolute planar distance between two observations i and j.Footnote 5\(K(d_{i,j})\) is a triangular (or Bartlett) kernel for spatial data. Conley (1999) justifies use of this type of kernel on the basis that it satisfies the conditions for ensuring positive semi-definite covariance matrices. The kernel is also commonly employed in Newey-West type autocorrelation consistent estimation in time series applications (e.g. Kim and Sun 2013). bw are bandwidths selected by the practitioner. Observation pairs that have a planar distance that exceeds the planar bandwidth \(bw_{dist}^{PLN}\) are assigned a correlation of zero in estimation.

In the Appendix we show how the kernel can be extended to include information on the elevation of each observation to generate measures of distance in three dimensions. Elevation may be relevant for observations located on severely undulating topographies or in applications in which observations can be more generally situated in three-dimensional space (such as apartments in multi-storey buildings). See Fleming et al. (2018) for an example of hedonic modelling of real estate prices of an amenity with consideration to multidimensional space.

The spatially-robust model averaging criteria is then

$$\begin{aligned} \hat{SRC_{p}}(W)=\hat{e}(W)^{\prime }\hat{e}(W)+2\sum _{m=1}^{M}w_{m}{\rm{ \ tr}}\left[ \hat{\Phi }_{m}^{SP}\left( \sum _{i=1}^{n}x_{i(m)}x_{i(m)}^{\prime }\right) ^{-1}\right] \end{aligned}$$

where SMMA weights are chosen to minimize \(\hat{SRC_{p}}(W):\)

$$\begin{aligned} \hat{W}^{SMMA}=\arg \min _{W\in {\mathcal{H}}_{n}}\hat{SRC_{p}}(W) \end{aligned}$$

Estimation of \(\hat{W}^{SMMA}\) is again achieved via quadratic programming.

Given (11), the relationship between the original Hansen (2007) criterion and the SMMA criterion follows that in the case of zero spatial correlation, (\(K(d_{i,j})=0\)), the second term of (9) is eliminated. This reduces the penalty on model dimensionality to that of LO. As outlined earlier, and shown by LO, this criterion further reduces to the standard \(C_{n}\) MMA estimator under an assumption of homoskedasticity.

Alternative methods for incorporating spatial dependence

Under the spatial model averaging approach the spatial dependence in the panel is subsumed into the error term and estimated via the non-parametric Conley spatial HAC method. Alternative methods to account for spatial dependence include incorporating parametric correlation structures into the errors or parameterizing spatial dependence directly into the regression function. We briefly digress to discuss these alternative approaches under a model averaging paradigm.

Spatial error models specify a parametric structure for the dependence in the error terms and includes spatial autoregressive and moving average error models (Anselin 1988). Many of these require the practitioner to select a weight matrix that specifies the distance between each pair of observations in the sample. It is straightforward to solve for the covariance given a parametric dependence structure and a weight matrix (although in practice spatial AR(1) or MA(1) coefficients must be estimated from first stage residuals). Such an approach would, however, require the practitioner to specify the correct dependence structure, which is somewhat antithetical to the motivation for adopting a model averaging method. An interesting extension would be to explore model averaging to address model uncertainty in the parametric error function, whereby the practitioner averaged over different sets of regressors and different parametric structure for the error term, including different weighting matrices and functional forms.

Another parametric approach to modeling spatial dependence are spatial lag models (Anselin 1988). Mixed spatial-regressive models include a spatial lag of the dependent variable in the set of regressors. OLS estimators of these models are generally inconsistent, meaning that OLS MMA methods such as that of Hansen (2007, 2008), LO or SMMA should not be applied. However, Lee (2002) provides conditions on the weight matrix that ensure consistency of OLS. Under these assumptions the spatial lag cold be included as an additional regressor and the MMA method applied. In such cases, the LO heteroskedastic MMA method may be more appropriate since the spatial lag is likely to mop up much of the dependence in the error term. However, like may of the spatial error models, the spatial lag approach relies on the practitioner specifying the correct weighting matrix governing spatial dependence between the dependent variable of interest. For cases in which the practitioner has a set of spatial weights matrices to choose from, Zhang and Yu (2018) propose both model selection criteria and MMA approaches that can be employed under general conditions.

Finally, it worth noting that geographic regressors can account for dependence between different observations and thus play a role in accommodating spatial dependence. For example, in the case of house prices, the distance to geographic amenities such as transportation corridors or green spaces are likely to be important in generating correlations in house prices.

Monte Carlo study

We undertake a brief simulation study to investigate the finite sample performance of the SMMA estimator in the case of spatially correlated data structures. The setup closely follows that of Hansen (2008) and Liu and Okui (2013), except the error terms exhibit heteroskedasticity and spatial correlation.

The data generating process is defined as in (1), with the infinite series truncated at \(j=10000\). For each simulation replication a random sample is drawn such that \(x_{i,1}=1\) and all other \(x_{i,j}\) are independent with \(x_{i,j}\sim N(0,1)\). The error term is generated as \(e|X\sim N(0,\Sigma )\), where \(\Sigma\) represents a variance-covariance matrix with a distance-based correlation function. The correlation matrix takes the form \(C(dist_{i.j})=1-exp(-1/dist_{i.j}),\) where the cross sectional units are arranged in a “checkerboard” pattern in Euclidean space. Experiments based on other functional forms of spatial decay are qualitatively equivalent. The coefficients are determined according to Hansen (2008), where \(\theta _{j}=cj^{-1}\). Here varying the value of c in the set \(\{0.1,0.2,...,0.9\}\) determines the population \(R^{2}\) of the model. Finally, the sample sizes are varied between \(n=100\) and \(n=150\), the model sizes are varied between \(m=10\) and \(m=15\), and the number of simulation replications is 1000.

In addition to the SMMA, three other model averaging estimators are also estimated in order to assess comparative performance. These are the Mallows model averaging estimator of Hansen (2007) (MMA), the jackknife model averaging estimator (JMA) of Hansen and Racine (2012), and the Heteroskedasticity-robust (\(HRC_{p}\)) estimator of Liu and Okui (2013). The measure of performance of the estimators is by way of mean square error (MSE) of the associated asymptotic risk of the coefficient estimates:Footnote 6

$$\begin{aligned} MSE^{n,m}=\frac{1}{1000}\sum _{r=1}^{1000}\left[ (\hat{\mu }_{r}^{n,m}-\mu _{r}^{n,m})^{\prime }(\hat{\mu }_{r}^{n,m}-\mu _{r}^{n,m})\right] \end{aligned}$$

where the superscripts n and m denote the number of observations and number of models, respectively, and r denotes the simulation replication.

Figure 1 illustrates the ratio the MSE of the three alternative methods to the MSE of SMMA (denoted \(SRC_{p}\)). Ratios larger than one indicate SMMA has a smaller MSE on average. The results show that the standard MMA estimator is universally by the other estimators at smaller population \(R^{2}\) values and in smaller samples. The SMMA method outperforms the JMA and \(\hbox {HRC}_{{p}}\) approaches in simulations with higher population \(R^{2}\) values and in larger samples. For example, it outperforms in the \(n=150\) simulations when the population \(R^{2}\) exceeds 30%, and in the \(n=100\) sample when the \(R^{2}\) exceeds 50%. Although the advantage is not profound, such results do highlight the importance of developing spatially robust MMA type estimators when the underlying data structure is known to be characterized by weak spatial dependence.

Fig. 1

Simulation results. Notes: SRC\(_{p}\) denotes the spatial Mallows model averaging method; JMA denotes the jackknife model averaging estimator of Hansen (2012); HRC\(_{p}\) denotes the heteroskedasticity-robust estimator of Liu and Okui (2013); MMA denotes standard Hansen (2007) Mallows model averaging

Fig. 2

Locations of sample sub-markets overlaid on the greater Auckland urban area. Areas 1, 2, and 3 represent the geographic extent of the Takapuna, Te Atatu, and Howick samples, respectively. The Auckland SkyTower is represented by the black asterisk, acting as the location for the central business district

Hedonic imputation price index construction

Under the hedonic imputation (HI) approach a hedonic regression is fitted to a cross section of sales prices in each time period. The price index is then constructed using the predicted sales prices in conventional price index formulae (such as Fisher Ideal or Case-Shiller Repeat Sales). A key advantage of the HI approach is that it permits that hedonic coefficients on property attributes (such as the number of bedrooms or land area) to change over time.

The standard hedonic specification utilized in the real estate literature is a semi-log specification. As Malpezzi (2008) highlights, use of the log-linear functional form should be preferred on the basis that it explicitly attaches a non linear marginal effect of characteristic changes to prices. For instance, we would not expect the price effect of the addition of one bedroom to be linear in the number of bedrooms. A second benefit is that it gives the coefficients an appealing interpretation as the approximate percentage change in price, given a one unit change in a hedonic characteristic. A third advantage is that the log form can mitigate heteroskedasticity in the regression error. The general semi-log hedonic specification used here is given by:

$$\begin{aligned} p_{i(t),t}=x_{i(t),t}^{\prime }\mu (W_{t})_{t}+\epsilon _{i(t),t}, \end{aligned}$$

where \(p_{i(t),t}\) represents the log transformation of the transaction price, \(t=1,...,T\) indexes the time periods, and \(i(t)=1(t),...,n(t)\) indexes the cross sections observed in period t. Because our empirical application is to repeated cross sections of transaction prices, the cross sectional index is dependent on the time period t. For instance, if there are a total of \(n=100\) properties in the sample and \(i=\{8,24,46,69,87\}\) are sold in period t, then \(i(t)=\{8,24,46,69,87\}\) and \(n\left( t\right) =5\). Omitting the t notation gives the false implication that all n properties transacted in period t.

The regression model (13) is estimated in each time period t,  meaning that T different regression functions are estimated using cross sectional data. \(\mu (W)_{t}\) represents a vector of (weighted) coefficients (estimated via SMMA) and \(\epsilon _{i(t),t}\) is a residual term. The estimated hedonic coefficients can then be utilized as the basis for an index which is constructed using fitted prices in the period of sale and the period prior to sale:

$$\begin{aligned} \hat{p}_{i\left( t\right) ,t} = x_{i(t),t}^{\prime }\hat{\mu }(W_{t})_{t}, { \ }\hat{p}_{i\left( t\right) ,t-1} = x_{i(t),t}^{\prime }\hat{\mu }(W_{t})_{t-1} \end{aligned}$$

This estimates market prices in period \(t-1\) as a function of properties (and their respective characteristics) observed in period t and the estimated (weighted) hedonic coefficients for period \(t-1\).

Indices can be constructed using standard chaining methods or repeat sales methods using these fitted sales pairs. One drawback of standard chaining approaches is that it does not reveal any inferential properties of the index estimates: To do so would require the use of bootstrapping techniques, such as those utilized in Pakes (2003) and Beer (2007).Footnote 7 In contrast, repeat sales regressions can be used to back out confidence intervals for price indices since these methods rely on a linear regression to reveal price trends. The estimation procedure is similar to that of Bailey et al. (1963)—except here a growth rate over a single period is considered, since fitted sales pairs are always adjacent to one-another in time. This adjacency also eliminates the requirement for a Case and Shiller (1987) type GLS weighting.Footnote 8

Remembering from (13) that estimated prices are expressed in log form, the index specification is as follows:

$$\begin{aligned} \hat{p}_{i(t),t}-\hat{p}_{i(t),t-1}=\gamma _{t}-\gamma _{t-1}+\epsilon _{i(t),t,t-1} \end{aligned}$$

where \(\hat{p}_{i(t),t}\) and \(\hat{p}_{i(t),t-1}\) are defined as above. Note that because both \(\hat{p}_{i(t),t}\) and \(\hat{p}_{i(t),t-1}\) are estimates, this constitutes a “dual” type imputation approach. A key benefit of this approach is to reduce the effects of specification bias in the hedonic setup (Hill 2013).Footnote 9

Following estimation of the hedonic index represented by (14), asymptotic standard errors can be estimated for the fitted parameters \(\left\{ \hat{\gamma }_{t}\right\} _{t=1}^{T}\). The standard errors can then be used to calculate confidence intervals for the respective index estimates in order to evaluate the accuracy of the index.Footnote 10 Given the nature of the dataset in that both time series and cross sectional spatial autocorrelation are likely to be present, these standard errors are estimated using the Kim and Sun (2013) spatiotemporal HAC estimator.

Note that (14) uses fitted values as the regressands in the equation. Thus the accuracy of the index hinges on the accuracy of the predicted prices. This thinking motivates the use of the SMMA method, since it is designed to select weights that minimize the quadratic loss of the prediction. We therefore expect the SMMA approach to yield a price index that is more accurate than other model selection methods that rely on either ad hoc approaches or model selection criteria, such as BIC. We demonstrate that this is indeed the case in our empirical application.


In this section we apply the SMMA-based hedonic imputation index to three suburbs in the Auckland region. The benefits of model averaging are expected to be more apparent in these localized geographies because the bias-variance trade-off in selecting a model size is likely to be more acute in smaller samples. The three different suburbs are selected to reflect represent vastly different demographic and dwelling stock profiles. We compare the method to standard OLS based methods for constructing hedonic imputation based indices.


Our dataset consists of sales transactions for the whole of Auckland and spans Quarter 1 1990 to Quarter 4 2017 (inclusive). We select three suburban markets from this core dataset for analysis. These suburbs represent diverse property sub-markets from across Auckland and are based on Statistics New Zealand statistical area units (SAU).Footnote 11 The three areas, Takapuna, Te Atatu, and Howick are outlined in Fig. 2. The three suburbs have an average of 177, 191, and 226 observations, respectively, in each of the 112 quarterly time periods within the full sample.

Table 1 Descriptive Statistics of Residential Sales Transaction Data

The sub-markets are selected to represent three relatively similar-sized suburbs that markedly differ in terms of their housing stock and socioeconomic demographics. This allows us to consider the performance of indices across small markets that are characterized by substantive differences in housing stocks and housing demand. 2001 census data show the Takapuna, Te Atatu and Howick samples have an average household income approximately \(104\%\), \(89\%\), and \(113\%\) of the average household income in the Auckland region, respectively. Table 1 provides the descriptive statistics for each sub-market and highlights the key underlying differences in the nature of the housing stock in each suburb.

Takapuna has a diverse housing stock. It encompasses the historic southern area of the peninsula (houses in which are subject to heritage provisions to prohibit teardown and replacement) and more modern northern areas. To split these areas would preclude the construction of reliable hedonic imputation indices due to the relatively low observation counts in these different sub-regions. In addition, many of the dwellings on the peninsula have desirable sea views. Thus it represents a key area of interest for price measurement techniques as the diversity of housing stock is likely to result in non trivial shifts in sales transaction composition across different years. In the case of mean and median type price measurement, this is likely to result in higher levels of price growth volatility, which obscures the true underlying price movements.

A large portion of the housing stock in Te Atatu also lies on a peninsula. However, unlike Takapuna, the housing stock is relatively more homogeneous in nature, as it lacks a significant historic district, and the majority of the build-out occurred as a result of a motorway extension in the 1960s. Table 1 demonstrates that the Te Atatu area represents the least valuable market of the three samples, reflecting the socioeconomic profile of the location.

The Howick sample represents a relatively newer suburb that is a substantial distance from the central business district. It is however similar to Takapuna in terms of average building characteristics and coastal amenity. There is also less diversity in building characteristics, implying a relatively more homogeneous housing stock than the Takapuna area. Furthermore, the relatively lower average age of the housing stock implies a lesser requirement for extensive renovations or reconstruction of housing stock.

The dataset provides a wide range of information for each property transaction. This includes SAU membership, sales price data, government valuation data, land size, building size, number of bedrooms and bathrooms, and various other qualitative and quantitative property characteristics. In addition, the dataset is geo-referenced with World Geodetic System (WGS) 1984 coordinates, permitting the estimation of spatial relationships such as distance to known amenities and pairwise property distances for the MMA method.

Table 1 provides descriptive statistics on the candidate set of variables used in the hedonic regressions. The set of explanatory variables includes characteristics that are frequently used in the hedonic house price literature. These are land area, structure floor area, structure site coverage, age, the number of bedrooms and bathrooms, and the distance to the central business district.Footnote 12 We include dummy indicators for building condition, multi-storey dwellings, and appreciable views. We also include information on the redevelopment potential of the property. Clapp et al. 2012 argue that hedonic specifications that fail to control for the redevelopment premium are incorrectly specified. Changes in the value of these premiums are particularly important source of variation in house prices in our sample due to changes in residential zoning in Auckland in 2016 (see Greenaway-McGrevy et al. 2021). Following Clapp et al. (2012) we therefore include a proxy for the redevelopment premium—site intensity—which is the ratio of the estimated value of improvements of a property to the total estimated value.Footnote 13 To capture how land use regulations affect the redevelopment premium, we include dummy indicators for different zoning designations and interact these zoning dummies with site intensity. Finally, we include indicators for the suburb of the dwelling. This yields a total of sixteen variables in the set of explanatory regressors. The ad hoc hedonic approach includes all sixteen in the regression specification. The set of potential regressors could be expanded by considering linear transformations and interactions between regressors. We do not consider this extension in order to keep the analysis tractable.

The sequence of models necessary for model averaging contains the locational fixed effects as the first model, and the remaining fifteen characteristic variables are ordered according to the method used by Hansen (2007). That is, a regression is performed on the mean and variance standardized variables. The variables are then ranked according to the magnitude of the coefficients from this regression (largest to smallest). Following Hansen (2014) the variables are then also partitioned into candidate models consisting of groupings of three regressors each (excluding the locational fixed effects, which are contained in a single candidate model). This results in six possible candidate models in the MMA estimation procedure. For the spatial kernels required in the computation of SMMA (see (10)), we use a distance bandwidth of 5 kilometers (due to the relatively small geographic area of the suburbs studied).

The data is also filtered in order to remove poorly coded data and outliers. Observations with sale prices below \(\$10,000\) (NZ dollars) and above \(\$5,000,000\) are removed along with those listed as having less than 10m\(^{2}\) and greater than 500m\(^{2}\) of floor space, or over 2 hectares of land. Observations listed with missing data in any of the pertinent characteristic vectors are also removed. Finally, observations for sales of a property that occur twice in a single quarterly period are removed to reduce the effect of non arms-length transactions on estimated market prices.Footnote 14

Although the ad hoc BIC and the SMMA HIs are based on the same set of regressors, the BIC method omits some regressors altogether, while the SMMA method down-weights some regressors relative to others. The ad hoc method includes all regressors but with equal weights. Across the 112 time periods, the average OLS R-Squared for the hedonic model containing all regressors are 0.82, 0.81, and 0.78 for the Takapuna, Te Atatu, and Howick samples, respectively.

Price indices

We begin by illustrating the conventional mean and median price indices which are often reported at the suburban level. Figure 3 exhibits price indices normalized to unity in Q1 1990. The Takapuna mean and median indices are volatile and display the most price growth, consistent with the prices listed in the descriptive statistics and the relatively heterogeneous housing stock in the geographic area. As Hill (2013) highlights, regions with heterogeneous housing stock are likely to experience shifts in the distribution of the nature of the quality of the housing stock sold in each period. This can result in the conflation of price changes with quality differences in each period, ultimately leading to large and noisy estimates of price change evident here. Interestingly, we see that overall price movement trends are very consistent over the three suburban markets and each index displays key turning points in the periods 2008, 2012, and 2016. These movements are also highly correlated with headline price indices for Auckland city as a whole. This indicates that while the magnitude of price growth differs, it appears that the sub-samples are characterized by common market drivers.

Fig. 3

Mean and Median Price Indices

Fig. 4

Hedonic Imputation Price Indices

Next we consider the HI indices for the three suburbs. Figure 4 exhibits indices. Immediately evident is that the three types of HI indices track each other, with minor divergences in index levels. In each case the OLS approach produces estimates with slightly higher levels, which become more pronounced in the later periods. As Table 2 highlights, this naturally results in marginally lower overall average growth estimates for the SMMA based HI indices. Table 2 also demonstrates that overall volatility of quarterly growth estimates implied by these indices is universally lower for the SMMA HI indices.

Comparing the HI indices to the mean and median indices immediately highlights the risk of relying on mean and median indices in smaller sample sizes. Both the figures and Table 2 demonstrate the constant-quality HI indices to be substantially less volatile than their mean and median counterparts. Furthermore, we see that in the Takapuna sample, the mean and median indices are substantially upwards biased. This is likely the result of quality improvements, such as renovations, on old(er) individual properties. Such improvements are likely to inflate mean and median price growth estimates once the improvements are capitalized into transaction prices.

The HI indices for Te Atatu also demonstrate a rapid and substantial price appreciation in the quarterly periods between 2012 and 2016. This coincides with the general upwards trend in Auckland, however the magnitude is substantially larger than that seen in other two suburbs. Possibly driving this growth is an increase in demand for more affordable houses at the lower end of the price distribution. Interestingly, the drop-off in values following the 2016 period implied by the mean and median indices is not seen in the HI indices. Instead we see a plateau more consistent with the other samples. This further demonstrates the importance of constant-quality price measurement, as it shows that average and median prices are dropping due to a compositional shift towards lower value housing – not due to individual houses decreasing in value.

In the Howick sample the SMMA and OLS HI indices are almost indistinguishable. What this implies is that there is an overall greater convergence in the OLS and (weighted) SMMA hedonic coefficients. This could be the result of greater consistency in the housing stock composition over time. In contrast to the Takapuna sample, the levels of the mean, median and HI indices are relatively similar. The implication is that there appears to be a low level of bias arising from composition changes and quality changes. The lesser amount of quality change is somewhat expected in the case of newer housing stock as there is less need for renovation or replacement work, which can bias price growth estimates upwards.

There is also evidence to suggest that the hedonic specification should remain flexible over the estimation period. Table 3 provides the average weights assigned to each of the candidate models, alongside variances. These variances can be substantial, indicating that the hedonic specification should not be held fixed over long time periods. Instead, flexibility is critical in order to adapt to changes in the joint distribution of housing characteristics over time. It is also important to note that the differences between the OLS and SMMA HI indices are directly attributable to the nature of these weights. Essentially, as the final candidate model receives more weight, the two sets of index estimates converge.

Table 2 Summary Statistics of Price Indices (quarterly percent changes)
Table 3 Spatial MMA weights

Index evaluation

In this subsection we evaluate the accuracy of the SMMA HI index and compare it to other relevant HI benchmarks. Overall, we show SMMA HI to represent the best all-round estimation approach for small sample hedonic imputation index construction. Our first metric of index accuracy is based on the confidence intervals around the estimated price index from (14). The tighter the confidence interval, the greater the precision of the price index estimate when evaluated by variance. However, adopting variance as the metric of precision ignores potential bias in the predicted house price inflation rates that are used as dependent variables in (14). We therefore consider a second measure of precision that is based on the predictive ability of the index to forecast individual out of sample house price growth at the individual level. The measure of predictive ability we adopt—root mean square forecast error—reflects both the bias and the variance of the predicted house prices.

Index confidence intervals

Regression (14) provides a direct measure of index accuracy because index growth rates are estimated parameters obtained from a regression model. We can therefore assess index accuracy based on the confidence intervals for the estimated parameters \(\left\{ \hat{\gamma } _{t}\right\} _{t=1}^{T}\). It is straightforward to obtain standard errors for fitted parameters since they are obtained via (weighted) OLS estimation. The tighter the confidence interval, the more accurate the estimated parameter. We employ a HAC estimator that is robust to spatiotemporal dependence in the error term when constructing standard errors. (Refer to the Appendix for details.) Figure 5 exhibits the confidence intervals for each of the three suburbs. We express the confidence intervals as ratios to the SMMA confidence interval to facilitate a comparison of the three methods.

SMMA consistently produces tighter confidence intervals for index growth estimates.Footnote 15Furthermore, this advantage of SMMA HI appears greater in samples where data is sparse and the housing stock is relatively heterogeneous. The Takapuna sample, with its highly heterogeneous housing stock, appears to benefit the most from the SMMA procedure. On average, the relative difference in the OLS and SMMA confidence intervals is \(9.04\%\). Given that confidence intervals operate symmetrically around a point estimate, this translates to a roughly \(18\%\) decrease in the range between the upper and lower confidence bounds. Notably the BIC confidence interval trends upwards over time, indicating that BIC HI accuracy experiences a marked decline over the sample period. This is also consistent with the observed divergence between the BIC HI index and the ad hoc OLS and SMMA indices towards the end of the sample (see Fig. 4). In Te Atatu, we observe a similar ranking in performance: SMMA has the tightest confidence interval, followed by OLS and BIC. Notably the BIC confidence interval does not grow larger over the sample period, but nonetheless remains 7.5% larger than that of the SMMA HI. The same ordering is present in the Howick sample. Note however that the OLS HI has a confidence interval that is about 2% larger than that of the SMMA HI. Given the larger number of sales observations in the Howick sample, this perhaps reflects the fact that the trade-off between misspecification bias and overfitting variance is less acute when there are more data available. The SMMA advantage relative to the BIC HI is much larger.

Fig. 5

Confidence Intervals for various Hedonic Imputation Price Index methods. Note: Confidence intervals expressed as ratio of the SMMA HI confidence interval

Out-of-sample price growth prediction

Jiang et al. (2015) and Hill et al. (2017) argue that price indices should be evaluated through measures of predictive ability. Accordingly, we implement a simple test of out of sample prediction accuracy of the three different HI methods and the mean and median indices. We follow the general approach used by Jiang et al. (2015) and Hill et al. (2017) who bifurcate the sample of transactions into a training set and an evaluation set. Price indices are constructed using the training set. These price indices are then used to predict price growth for transactions in the evaluation set. The full procedure is outlined below:

  1. 1.

    For each of the Takapuna, Te Atatu, and Howick samples, properties with two (single repeat) or more (multiple repeat) sales transactions are identified. Every single repeat sales pair of transactions is assigned to the evaluation set. Along with these, the chronologically closest sales pair from the multiple repeat properties are also added to the evaluation set.Footnote 16 The evaluation set contains n(E) observations.

  2. 2.

    The training set contains all remaining transactions not assigned to the evaluation set. It contains \(n(T)=n-n(E)\) observations. The training sets for Takapuna, Te Atatu, and Howick contain \(71\%\), \(72\%\) and \(74\%\) of all transactions, respectively.

  3. 3.

    OLS and SMMA hedonic imputation indices are then estimated on the training set data for Takapuna, Te Atatu, and Howick areas.

  4. 4.

    A prediction of the second transaction price in each sales pair in the evaluation set is made utilizing the OLS, BIC and SMMA HI indices calculated from the training set. Root mean square prediction error (RMSPE) is used to benchmark the accuracy of these estimates against price growth estimated from actual transaction prices.

The formula for RMSPE is:

$$\begin{aligned} RMSPE^{j}=\sqrt{\frac{1}{n(E)^{j}}\sum _{i=1}^{n(E)^{j}}\left( \ln (\frac{ P_{i,t}}{P_{i,s}})-\ln (\frac{\hat{I}_{t}^{z}}{\hat{I}_{s}^{z}})\right) ^{2}} \end{aligned}$$

where j indices the sub-markets Takapuna, Te Atatu, and Howick. \(P_{i,t}\) and \(P_{i,s}\) denote observed prices at time t and time \(s<t\). \(I_{t}^{z}\) represents index z (OLS, BIC, SMMA) at time t and \(\hat{I}_{s}^{z}\) is the estimated index at time \(s<t\). For the HI methods, \(\hat{I}_{t}^{z}=exp( \hat{\gamma }_{t}^{z})\) from estimation of (14) using the training set of sales transaction observations. The mean and median price indices are calculate based on the training set. Note that all other notation is as previously defined.

Table 4 Out-of-sample Forecast Results
Table 5 Diebold-Mariano tests of equal predictive accuracy

Table 4 presents the results. Table entries are the ratio of the RMSPE of each method to that of SMMA HI, such that a ratio greater than one indicates that the SMMA has a lower RMSPE. With a single exception, tabulated entries are less than one, indicating that the use of model selection in hedonic imputation produces superior out of sample forecast predictions than an ad hoc OLS HI approach, the BIC HI approach, or mean and median based methods. The exception is Howick, for which BIC HI has a smaller RMSPE than SMMA HI.

We employ Diebold-Mariano tests of equal predictive accuracy to assess the statistical significance of these results. DM statistics are given in Table 5. We present the standard DM test statistic alongside a test statistic based on a spatiotemporally robust estimate of variance. (See the Appendix for details on the estimation of the Diebold-Mariano test corrected for spatiotemporal dependence in the forecast series.) The difference between the SMMA and OLS RMSPE is statistically significant. Comparing the RMSPE of the SMMA and BIC forecasts, only the results for Howick are statistically significant.


This paper proposes a novel approach to constructing hedonic imputation price indices using a new type of Mallows model averaging estimator that is robust to spatial correlation (SMMA). The key advantage of the SMMA HI approach is that, unlike OLS approaches, it is explicitly designed to balance specification error against overfitting in determining the specification of the auxiliary hedonic regression function. To date, the extant literature on hedonic indices has been largely silent on the issue of optimal model size, with advice limited to adhoc approaches.Footnote 17 The new method breaks new ground in this understudied area and provides a data-dependent mechanism for determining the specification of the hedonic model to assist practitioners in constructing more accurate hedonic indices.

To showcase the new method, we apply it to a handful of suburbs in Auckland, New Zealand, evaluating the accuracy of the method relative to relevant alternatives. Our results highlight the benefits of using model averaging when constructing hedonic imputation indices in sparse data environments. SMMA HI indices exhibit tighter confidence intervals than OLS estimates based on an ad hoc approach to regressor selection, and produce superior out of sample predictions of price growth. Our results also indicate that these advantages are greatest for housing markets characterized by a relatively heterogeneous housing stock. We also compare SMMA HI indices to HI indices based on BIC determination of model specification. While we also highlight that a BIC model selection approach may be a viable solution to the model selection issue in hedonic imputation, we conclude that, overall, the SMMA should be preferred as the better all-round approach. This is on the basis that, while the out of sample performance of the SMMA and BIC HI approaches is very similar, the SMMA approach maintains lower confidence intervals on index estimates and produces less volatile price indices.

The advantages of a SMMA HI are likely to be particularly noticeable in small sample environments such as tight geographic areas or at high sampling frequencies (e.g. monthly or weekly indices). Future research could focus on the further development of spatially robust MMA type estimators in order to expand the scope of model averaging applications, and explore index performance in a broader geography but at a higher frequency. We leave these areas for future research.


  1. 1.

    To the best knowledge of the authors, this feature of hedonic imputation has heretofore been overlooked by in the real estate literature.

  2. 2.

    A facetious model that includes only a constant for each time period illustrates this point. When estmated by OLS, this amounts to reporting the average house price each month, quarter or year.

  3. 3.

    Also see Table 2 in Bourassa et al. (2006) for a detailed comparison of the strengths and weaknesses of the repeat sales, Sales-Price to Appraisal Ratio (SPAR), and hedonic index methods.

  4. 4.

    Note that \(\sum _{m=1}^{M}w_{m}k_{m}\) also denotes the effective number of parameters in model averaging estimation. This is not restricted to be an integer, as in the case of the standard OLS estimator.

  5. 5.

    We use a minimum spherical distance estimate—see the Appendix Alternative measures of distance could also be considered here, such as Manhattan (using detailed geospatial data).

  6. 6.

    Also note that only the nested case is considered here. While non-nested cases are possible, it is important to remember that the number of possible models is growing exponentially in \((m-1)\) (in the case of regressor groupings being restricted in number to 1) which results in substantial computational burden when spatial correlations also require estimation.

  7. 7.

    In essence a hedonic index estimate should be considered a random variable, due to the fact that prices are estimated on the basis of a sample of observations. However, except for the work of Pakes (2003), Beer (2007), and Diewert et al. (2009), little research has focussed on inference for price indices.

  8. 8.

    The Case and Shiller (1987) approach implements a weighting procedure which penalises greater lengths in time between sales transactions of a single property in order to mitigate issues of quality change.

  9. 9.

    The single imputation index references observed prices in period t (\({p} _{i(t),t}\)) against estimated prices in period \(t-1\) (\(\hat{p}_{i(t),t-1}\)).

  10. 10.

    For example, these can be calculated for the \(95\%\) interval as \(\hat{\gamma _{t} }\pm 1.96\cdot \sqrt{v\hat{a}r\left\{ \hat{\gamma _{t}}\right\} }\).

  11. 11.

    SAUs in an urban area are typically constituted of a collection of city blocks and nest statistical meshblocks. Meshblocks (MB) are the lowest level of statistical area designation in New Zealand and generally constitute a city block in an urban setting. The designations supplied in the dataset are based on the 2006 geographic boundaries. Each of the three sub-markets were based on aggregations of these SAUs.

  12. 12.

    Age is estimated as the difference between the sale year and the mid point of the decade of construction owing to the fact that very few observations have detail on the exact year of construction. Distances are calculated using a ‘haversine’ formula which provides the minimum (spherical) distance between two sets of coordinates.

  13. 13.

    These valuations are obtained from local government valuations used for the purpose of levying local property taxes.

  14. 14.

    It is assumed that in order to buy and sell a property within a single quarter one or both transactions will likely represent a non arms-length transaction. Hill et al. (2017) note that exclusion of these sales is standard practice in the construction of house price indices.

  15. 15.

    However, in unreported results we found that the LO method produced tighter confidence intervals than SMMA in almost all suburbs and time periods. Since the SMMA nests the LO method as a special case, allowing for spatial dependence in addition to heteroskedasticity produced no advantage in this set of applications.

  16. 16.

    This is similar to the approach of Hill et al. (2017), whereas Jiang et al. (2015) utilised only the final sales pair of multiple repeats sales properties. Using the Jiang et al. (2015) approach tends to disproportionately allocate transactions from later time periods to the evaluation set. In the case of hedonic imputation, this can be particularly problematic due to an erosion of degrees of freedom. It also has the effect of reducing any possible effects of quality change over time.

  17. 17.

    Malpezzi (2008; p. 83) states “whenever sample sizes are small, and especially if the application will involve some prediction out of sample (as with, say, pricing rent-controlled or subsidized units), it is often best to stick to a simple parsimonious specification”.


  1. Anselin L (1988) Spatial econometrics: methods and models. Springer, New York

    Book  Google Scholar 

  2. Anselin L (2010) Thirty years of spatial econometrics. Pap Region Sci 89(1):3–25

    Article  Google Scholar 

  3. Akgun O, Pirotte A, Urga G, Yang Z (2020) Equal predictive ability tests for panel data with an application to OECD and IMF forecasts, papers 2003.02803.

  4. Bailey MJ, Muth RF, Nourse HO (1963) A regression method for real estate price index construction. J Am Stat Assoc 58(304):933–942

    Article  Google Scholar 

  5. Barnard GA (1963) New methods of quality control. J Roy Stat Soc 126:255–259

    Article  Google Scholar 

  6. Bates JM, Granger CWJ (1969) The combination of forecasts. Oper Res Quart 20:319–325

    Article  Google Scholar 

  7. Beer M (2007) Bootstrapping a Hedonic Price Index: experience from used cars data. Adv Stat Anal 91(1):77–92

    Article  Google Scholar 

  8. Bokhari S, Geltner D (2012) Estimating real estate price movements for high frequency tradable indexes in a scarce data environment. J Real Estate Finance Econ 45:522–543

    Article  Google Scholar 

  9. Bollerslev T, Patton AJ, Wang W (2016) Daily house price indices: construction, modeling, and longer-run predictions. J Appl Econ 31(6):1005–1025

    Article  Google Scholar 

  10. Bourassa SC, Hoesli M (2017) High-frequency house price indexes with scarce data. J Real Estate Lit 25(1):207–220

    Article  Google Scholar 

  11. Bourassa SC, Hoesli M, Sun J (2006) A simple alternative house price index method. J Hous Econ 15:80–97

    Article  Google Scholar 

  12. Case K, Shiller R (1987) Prices of single family homes since 1970: new indexes for four cities. New Engl Econ Rev 45–56

  13. Case B, Polakowski HO, Wachter SM (1991) On choosing among house price index methodologies. Real Estate Econ 19(3):286–307

    Article  Google Scholar 

  14. Claeskens G, Hjort NL (2008) Model selection and model averaging. Cambridge series in statistical and probabilistic mathematics. Cambridge University Press, Leiden

    Google Scholar 

  15. Clapp JM, Giaccotto C (1992) Estimating price indices for residential property: a comparison of repeat sales and assessed value methods. J Am Stat Assoc 87(418):300–306

    Article  Google Scholar 

  16. Clapp JM, Salavei Bardos K, Wong SK (2012) Empirical estimation of the option premium for residential redevelopment. Region Sci Urban Econ 42:240–256

    Article  Google Scholar 

  17. Conley TG (1999) GMM estimation with cross sectional dependence. J Econ 92:1–45

    Article  Google Scholar 

  18. Cotteleer G, Stobble T, Cornelis van Kooten G (2011) Bayesian model averaging in the context of spatial hedonic pricing: an application to farmlands. J Region Sci 51(3):540–557

    Article  Google Scholar 

  19. Diewert E, Greenlees J, Hulten C (2009) Hedonic imputation versus time dummy hedonic indexes. In: Price index concepts and measurement. National Bureau of Economic Research, pp 161–196

  20. Elhorst JP (2010) Applied spatial econometrics: raising the bar. Spat Econ Anal 5(1):9–28

    Article  Google Scholar 

  21. Eurostat (2013) Handbook on residential property prices indices (RPPIs). Eurostat methodologies and working papers. Publications Office of the European Union, Luxembourg

  22. Fleming D, Grimes A, Lebreton L, Mare DC, Nunns P (2018) Valuing sunshine. Region Sci Urban Econ 68:268–276

    Article  Google Scholar 

  23. Goh YM, Costello G, Schwann G (2012) Accuracy and robustness of house price index methods. Housing Stud 27(5):643–666

    Article  Google Scholar 

  24. Granger CWJ, Ramanathan R (1984) Improved methods of combining forecasts. J Forecast 3(2):197–204

    Article  Google Scholar 

  25. Greenaway-McGrevy R (2021) Forecast combination for VARs in large N and T Panels. Int J Forecast (forthcoming)

  26. Greenaway-McGrevy R, Pacheco G, Sorensen K (2021) The effect of upzoning on house prices and redevelopment premiums in Auckland, New Zealand. Urban Stud 58(5):959–976

    Article  Google Scholar 

  27. Hansen J (2006) Australian house prices: a comparison of hedonic and repeat sales measures. Research discussion paper 2006-03. Reserve Bank of Australia

  28. Hansen B (2007) Least squares model averaging. Econometrica 75(4):1175–1189

    Article  Google Scholar 

  29. Hansen B (2008) Least-squares forecast averaging. J Econ 146:342–350

    Article  Google Scholar 

  30. Hansen B (2014) Model averaging, asymptotic risk, and regressor groups. Quant Econ 5:495–530

    Article  Google Scholar 

  31. Hansen BE, Racine JS (2012) Jacknife model averaging. J Econ 167:38–46

    Article  Google Scholar 

  32. Helbich M, Brunauer W, Vaz E, Nijkamp P (2014) Spatial heterogeneity in hedonic house price models: the case of Austria. Urban Stud 51(2):390–411

    Article  Google Scholar 

  33. Hill RJ (2013) Hedonic price indexes for residential housing: a survey, evaluation and taxonomy. J Econ Surv 27(5):879–914

    Google Scholar 

  34. Hill RJ, Melser D (2008) Hedonic imputation and the price index problem: an application to housing. Econ Inq 46(4):593–609

    Article  Google Scholar 

  35. Hill RJ, Scholz M (2017) Can geospatial data improve house price indexes? A hedonic imputation approach with splines. Review Income Wealth 64:737–756

    Article  Google Scholar 

  36. Hill RC, Melser D, Syed I (2009) Measuring a boom and bust: the Sydney housing market 2001–2006. J Hous Econ 18:193–205

    Article  Google Scholar 

  37. Hill RJ, Rambaldi AN, Scholz M (2017) Weekly hedonic house price indices: an imputation approach from a spatio-temporal model. In: Proceedings of the 34th general conference of the association for research on income and wealth

  38. Holly S, Pesaran MH, Yamagata T (2010) A spatio-temporal model of house prices in the USA. J Econ 158:160–173

    Article  Google Scholar 

  39. Jiang L, Phillips P, Yu J (2015) New methodology for constructing real estate price indices applied to the Singapore residential market. J Bank Finance 61(2):121–131

    Article  Google Scholar 

  40. Kim MS, Sun Y (2013) Heteroskedasticity and spatiotemporal dependence robust inference for linear panel models with fixed effects. J Econ 177:85–108

    Article  Google Scholar 

  41. Lee L (2002) Consistency and efficiency of least squares estimation for mixed regressive. Spatial autoregressive models. Econ Theory 18(2):252–277

    Article  Google Scholar 

  42. LeSage JP, Parent O (2007) Bayesian model averaging for spatial econometric models. Geographical Anal 39:241–267

    Article  Google Scholar 

  43. Liang H, Zou G, Wan A, Zhang X (2011) Optimal weight choice for frequentist model average estimators. J Am Stat Assoc 106:495

    Article  Google Scholar 

  44. Liao J, Zou G, Yan G (2019) Spatial Mallows model averaging for geostatistical models. Can J Stat 47(3):336–351

    Article  Google Scholar 

  45. Liu Q, Okui R (2013) Heteroscedasticity-robust Mallow’s model averaging. Econ J 16:463–472

    Google Scholar 

  46. Liu Q, Okui R, Yoshimura A (2016) Generalized least squares model averaging. Econ Rev 35(8):1692–1752

    Article  Google Scholar 

  47. Mallows C (1973) Some comments on CP. Technometrics 15(4):661–675

    Google Scholar 

  48. Malpezzi S (2008) Hedonic pricing models: a selective and applied review. Housing Economics and Public Policy, pp 67–89

  49. Moral-Benito E (2015) Model averaging in economics: an overview. J Econ Surv 29(1):46–75

    Article  Google Scholar 

  50. Pace KR, Barry R, Sirmans CF (1998) Spatial statistics and real estate. J Real Estate Finance Econ 17(1):5–13

    Article  Google Scholar 

  51. Pakes A (2003) A reconsideration of hedonic price indexes with an application to PC’s. Am Econ Rev 93(5):1578–1596

    Article  Google Scholar 

  52. Shiller RJ (2008) Derivatives markets for home prices. Working Paper 13962. National Bureau of Economic Research

  53. Silver M (2016) How to better measure hedonic residential property price indexes. Working Paper 16/312. International Monetary Fund

  54. Silver M, Heravi S (2007) The difference between hedonic imputation indices and time dummy hedonic indices. J Bus Econ Stat 25(2):239–246

    Article  Google Scholar 

  55. Sirmans GS, Macpherson DA, Zietz EN (2005) The composition of hedonic pricing models. J Real Estate Lit 13(1):3–43

    Google Scholar 

  56. Steel M (2017) Model averaging and its use in economics. J Econ Lit 58(3):644–719

    Article  Google Scholar 

  57. Triplett J (2006) Handbook on hedonic indexes and quality adjustments in price indexes. OECD Publishing, Paris

    Book  Google Scholar 

  58. Wan A, Zhang X, Zou G (2010) Least squares model averaging by Mallow’s criterion. J Econ 156:277–283

    Article  Google Scholar 

  59. Zhang X, Ullah A, Zhao S (2016) On the dominance of Mallows model averaging estimator over ordinary least squares estimator. Econ Lett 142:69–73

    Article  Google Scholar 

  60. Zhang X, Yu J (2018) Spatial weights matrix selection and model averaging for spatial autoregressive models. J Econ 203(1):1–18

    Article  Google Scholar 

Download references


The authors thank Arthur Grimes and seminar partcipants at the 2020 meeting of the New Zealand Association of Economists for their comments on the paper. This work was supported in part by the Marsden Fund Council from Government funding, administered by the Royal Society of New Zealand, under Grant No. 16-UOA-239. Sorensen gratefully acknowledges the support of the Kelliher Trust PhD Scholarship. We thank Corelogic New Zealand for providing the residential transaction dataset.

Author information



Corresponding author

Correspondence to Ryan Greenaway-McGrevy.

Ethics declarations

Conflict of interest

The author’s declare no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Spatial distance calculation

The ‘haversine’ approach is measure distance between two points. This is a standard trigonometric approach for estimating minimum distance between two sets of coordinates over a spherical surface (such as the earth). The formula for spherical distance is given below:

$$\begin{aligned} d_{i,j}=6371\times \sqrt{(lat_{i}-lat_{j})^{2}+\left( \cos \left( \frac{ (lat_{i}+lat_{j})}{2}\right) \times (long_{i}-long_{j})\right) ^{2}} \end{aligned}$$

where 6371 is the approximate spherical radius of the earth in kilometers ( km), implying that \(d_{i,j}\) is given in km. \(lat_{i}\) denotes the latitude coordinate for observation i (Y axis in a plane), and \(long_{i}\) denotes longitudinal (X axis in a plane) coordinate for the observation i. The coordinate system is based on the WGS 1984 coordinate system. Every point on the earth has a WGS 1984 set of latitude and longitude coordinates and these are the basis for the modern Global Positioning System (GPS). An example is the Auckland Skytower which has the WGS 1984 coordinate location of \(-36.8482\), 174.7619.

Spatial distance with elevation

Information on the elevation of a geographic area is increasingly available through detailed digital elevation models (DEMs). These datasets can be spatially joined to standard geographic coordinates, adding vertical dimensionality to planar data, and offering researchers the ability to model spatial dependence in three dimensions rather than two. The SMMA method can straightforwardly accommodate information on elevation by re-specifying the spatial kernel to span three dimensions:

$$K^{ + } (d_{{i,j}} ) = \left\{ {\begin{array}{*{20}l} {\left( {1 - \frac{{dist_{{i,j}}^{{PLN}} }}{{bw_{{dist}}^{{PLN}} }}} \right)\left( {1 - \frac{{dist_{{i,j}}^{{ELV}} }}{{bw_{{dist}}^{{ELV}} }}} \right)} \hfill & {for\;dist_{{i,j}}^{{ELV}} < bw_{{dist}}^{{ELV}} ,dist_{{i,j}}^{{PLN}} < bw_{{dist}}^{{PLN}} } \hfill \\ 0 \hfill & {otherwise} \hfill \\ \end{array} } \right\}$$

where \(dist_{i,j}^{ELV}\) is absolute elevation difference between observations i and j.

Spatiotemporal robust covariance estimator

The spatiotemporal robust plug-in covariance estimator is based on the panel estimator of Kim and Sun (2013). We can think of this as an augmentation of the standard Conley (1999) estimator with a Newey-West type estimator for autocorrelation. The setup is presented below in (17).

$$\begin{aligned} \hat{\Omega }^{SPT}=\frac{1}{nT}\sum _{i,j=1}^{n}\sum _{t,s=1}^{T}K(D_{ij}) \cdot K(T_{ts})\cdot \hat{V}_{i,t}\hat{V}_{j,s}^{\prime } \end{aligned}$$

where \(\hat{V}_{i,t}=X_{i,t}e_{i,t}\), \(K(D_{ij})=1-\frac{d_{i,j}}{bw_{D}}\), \(K(T_{ts})=1-\frac{|t-s|}{bw_{T}}\). \(d_{i,j}\) is calculated as in (16). Note \(bw_{D}\) and \(bw_{T}\) denote distance and time bandwidths, respectively.

Diebold-Mariano test of forecast accuracy

The standard Diebold-Mariano (DM) test statistic for equal forecast accuracy is:

$$\begin{aligned} D^{DM}=\frac{\bar{d}}{\sqrt{\frac{\hat{S}}{nT}}} \end{aligned}$$

where \(\bar{d}\) is the mean loss differential in squared forecast errors and \(\hat{S}\) is the estimated long run variance of the sequence of squared forecast errors. We generalize the DM test to a spatial panel data set-up as follows. First, we let \(\hat{Z}_{i,t}\) denote the difference in squared forecast error between model \(m_{1}\) and \(m_{2}:\)

$$\begin{aligned} \hat{Z}_{i,t}=\hat{e}_{i,t,m_{1}}^{2}-\hat{e}_{i,t,m_{2}}^{2}, \end{aligned}$$

Then the loss differential is \(\bar{d}=\frac{1}{nT}\sum _{i=1}^{n} \sum _{t=1}^{T}\hat{Z}_{i,t}\). Finally, we obtain a “long run” estimate of forecast error variance that is similar to the spatiotemporal estimator introduced in (17):

$$\begin{aligned} \hat{S}=\frac{1}{nT}\sum _{i,j=1}^{n}\sum _{t,s=1}^{T}K(D_{ij})\cdot K(T_{ts})\cdot (\hat{Z}_{i,t}-\bar{d})(\hat{Z}_{j,s}-\bar{d}) \end{aligned}$$

Akgun et al. (2020) provide conditions under which panel DM statistics scaled by \(\sqrt{nT}\) follow a standard Normal limiting distribution under the null hypothesis of equal forecast accuracy, including spatial panel statistics such as \(D^{DM}\).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Greenaway-McGrevy, R., Sorensen, K. A spatial model averaging approach to measuring house prices. J Spat Econometrics 2, 6 (2021).

Download citation


  • Hedonic Imputation Price Index
  • House prices
  • Model averaging
  • Mallows model averaging

JEL Classification

  • R31
  • E31