A spatial semiparametric M-quantile regression for hedonic price modelling

Schirripa Spagnolo, Francesco; Borgoni, Riccardo; Carcagnì, Antonella; Michelangeli, Alessandra; Salvati, Nicola

doi:10.1007/s10182-023-00476-w

A spatial semiparametric M-quantile regression for hedonic price modelling

Original Paper
Open access
Published: 30 March 2023

Volume 108, pages 159–183, (2024)
Cite this article

Download PDF

You have full access to this open access article

AStA Advances in Statistical Analysis Aims and scope Submit manuscript

A spatial semiparametric M-quantile regression for hedonic price modelling

Download PDF

Francesco Schirripa Spagnolo ORCID: orcid.org/0000-0002-5014-8041¹,
Riccardo Borgoni²,
Antonella Carcagnì²,
Alessandra Michelangeli² &
…
Nicola Salvati¹

1701 Accesses
Explore all metrics

Abstract

This paper proposes an M-quantile regression approach to address the heterogeneity of the housing market in a modern European city. We show how M-quantile modelling is a rich and flexible tool for empirical market price data analysis, allowing us to obtain a robust estimation of the hedonic price function whilst accounting for different sources of heterogeneity in market prices. The suggested methodology can generally be used to analyse nonlinear interactions between prices and predictors. In particular, we develop a spatial semiparametric M-quantile model to capture both the potential nonlinear effects of the cultural environment on pricing and spatial trends. In both cases, nonlinearity is introduced into the model using appropriate bases functions. We show how the implicit price associated with the variable that measures cultural amenities can be determined in this semiparametric framework. Our findings show that the effect of several housing attributes and urban amenities differs significantly across the response distribution, suggesting that buyers of lower-priced properties behave differently than buyers of higher-priced properties.

Hedonic House Price Modeling Based on Multilevel Structured Additive Regression

A spatial hedonic model application of variance function regression to residential property prices in Beijing

Article 21 March 2015

Hedonic real estate price estimation with the spatiotemporal geostatistical model

Article Open access 14 November 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In the hedonic approach, the price of a house is interpreted as a market evaluation of the particular package of characteristics embodied in it using a hedonic price function (Gravel et al. 2006). The heterogeneity of preferences amongst households, however, implies that the estimated implicit prices of housing characteristics for different points of the house price distribution may be not constant. To date, some researchers have used quantile regression (Koenker and Bassett 1978) to capture consumer heterogeneity in housing demand or identify housing submarkets (Amédée-Manesme et al. 2017).

This paper is the first attempt to use the M-quantile approach to model the distribution of housing prices conditional on housing attributes and estimate the implicit prices of these attributes at different points of the housing price distribution. We focus on the Milan apartment market. Similar to other large cities, the apartment market in Milan is characterised by complex dynamics reflecting the heterogeneity of such a large city in terms of neighbourhood, building and population features. The Milan residential market has already been investigated in several fairly recent studies (Michelangeli and Zanardi 2009; Brambilla et al. 2013; Borgoni et al. 2018b, 2019). In particular, in one of these studies (Borgoni et al. 2018b), the hedonic approach was used to estimate the effect of culture, public transport, education and environmental conditions on the average housing market value in the city of Milan.

In this paper, we implement a hedonic framework and we apply a statistical model based on the M-quantile regression to obtain a robust estimation of the hedonic price function accounting for the heterogeneities of the price market mentioned above whilst preserving the efficiency of the regression parameters (Alfò et al. 2017).

M-quantile regression (Breckling and Chambers 1988) is a robustified ’quantile-like’ approach based on an influence function that can be used to grasp the differential effect of covariates at different levels of the conditional distribution of the response variable. This approach also allows a different set of regressors at different levels of the response function to be specified and, as will be made clear later in this paper, encompasses a wide variety of models, ranging from expectile regression (Newey and Powell 1987), to ordinary multiple regression and quantile regression (Koenker and Bassett 1978), hence providing a very rich and flexible tool for empirical market price data analysis. In fact, M-quantile regression can also be seen as a combination of quantile and expectile regression aiming at combining the robustness properties of quantiles with the efficiency properties of expectiles (Alfò et al. 2017). Hence, the suggested approach is able to provide robust estimators of the parameters of interest whilst preserving their efficiency.

In addition, M-quantile regression is a methodology nonparametric in nature and permits one to avoid the ubiquitous log transformation of the response variable typically adopted in the usual regression analysis of house prices. As will be discussed in more detail at the end of the paper, the logarithmic transformation approach has major disadvantages when one is interested in estimating the implicit prices of amenities, since it is necessary to back transform the log-prices on the original scale, thus introducing bias in estimates.

Since its introduction, M-quantile regression has been developed in several directions. Chambers and Tzavidis (2006) suggested that this approach can be an alternative to the mixed effect model in the small area estimation and Chambers et al. (2016) applied the M-quantile regression for binary data in this context. To account for the hierarchical structure of many datasets, Tzavidis et al. (2016) and Borgoni et al. (2018a) extended the M-quantile regression approach to two- and three-level random effect models, respectively, and Schirripa Spagnolo et al. (2020) included sampling weights in the M-quantile random-effects regression estimation procedure. Alfò et al. (2017) developed a finite mixture of quantile and M-quantile regression models for heterogeneous and/or dependent/clustered data. A semiparametric specification of M-quantile regression has been obtained by including univariate and bivariate spline components in the linear predictor to capture nonlinearities or to account for spatial trends (Pratesi et al. 2009; Dreassi et al. 2014).

In this paper, we also include parametric and semiparametric components in the model to account for the nonlinear effects of some predictors on price formation (Brunauer et al. 2013). As mentioned above, the spatial component is fundamental in determining house prices. Hereafter, we mainly investigated the spatial variability of prices induced by spatial trends that are modelled in a flexible manner using appropriate basis functions.

Moreover, we propose a method to determine the implicit price associated with the attribute modelled by the semiparametric component. The implicit price corresponds to the partial derivative of the hedonic price function for every quantile of interest. We show how to calculate this derivative when a semiparametric component is included in the model.

Our empirical findings show that several housing attributes vary greatly across the response distribution suggesting that buyers of lower-priced properties behave differently than buyers of higher-priced properties.

The remainder of this paper is organised into six sections. Section 2 presents an overview of the theoretical framework of hedonic price modelling. In Sect. 3, the dataset employed for the analysis is described. The statistical methodologies applied in this paper are discussed in detail in Sect. 4. Section 5 presents the empirical results. Conclusions are summarised in the last section of the paper.

2 Hedonic price modelling

In hedonic price theory, housing is viewed as a bundle of utility-bearing characteristics that are usually divided into housing-specific attributes and (dis)amenities, i.e. local-specific characteristics with a (negative) positive impact on household utility. Accordingly, in the hedonic approach, the price of a house is interpreted as a market evaluation of the particular package of characteristics embodied in it using a hedonic price function.

Rosen (1974) first developed a partial equilibrium model, where the supply of housing units is supposed to be fixed. This implies that housing prices are entirely demand driven. The theoretical framework shows that when an individual chooses a housing unit to buy, he implicitly decides the best combination of housing-specific attributes and local amenities according to his preferences and budget constraints. At equilibrium, households, who are price takers, select the preferred housing unit by equalising their marginal evaluation of each housing characteristic to the hedonic price implicitly determined by the housing market.

2.1 Theoretical framework

Mathematically, the hedonic price corresponds to the first derivative of the hedonic price function with respect to a given characteristic. In the case of a representative consumer in the housing market, or equivalently, assuming that all households are equal; the implicit price associated with a characteristic is the consumer’s marginal willingness to pay for an additional amount of that characteristic at the consumer’s optimal choice. For a given housing unit, let ${\varvec{x}}\in {\mathbb {R}}^{H}$ be the vector of the unit’s characteristics considered as normal goods. The representative consumer preferences are represented by an increasing and strictly concave utility function $U\left( {\varvec{x}}, w \right)$, where $w\in {\mathbb {R}}_{+}$ is the composite good assumed to be the numéraire. Let $P\left( {\varvec{x}}\right) $ be the equilibrium price schedule associated with the housing unit with attributes ${\varvec{x}}$. The optimal bundle corresponds to the solution of the following problem:

$$\begin{aligned} {\mathop {max}_{\left( {\varvec{X}},w\right) \in R^{H+1}_+} U\left( {\varvec{x}},\ w\right) \ }\mathrm {\ \ \ \ \ s.t.\ \ }m\ge P\left( {\varvec{x}}\right) +w, \end{aligned}$$

where m represents the consumer’s monetary resources. First order conditions for the internal solution $({{\varvec{x}}}^*,w^*)$ imply the following set of equations:

$$\begin{aligned} \frac{\partial P\left( {{\varvec{x}}}^*\right) }{\partial x_h}=\frac{{U\left( {{\varvec{x}}}^*,w^*\right) }_{x_h}\ }{{U\left( {{\varvec{x}}}^*,\ w^*\right) }_w},\ \ \ \ \ \ \forall \ h=1,\dots ,H, \end{aligned}$$

where $P\left( {{\varvec{x}}}^*\right) =m-{{\varvec{x}}}^*$, ${U\left( \cdot \right) }_{x_h}$ is the consumer’s marginal utility associated with the unit’s characteristic $x_h$, and ${U\left( \cdot \right) }_w$ is the marginal utility associated with the numéraire. At the optimum, the marginal substitution rate between $x_h$ and the numéraire is equal to the marginal willingness to pay for an additional amount of $x_h$.

In empirical applications, the classical estimation of the hedonic price function by ordinary least squares fits the representative agent framework since the estimated implicit price is a measure, on average, of the impact of each characteristic on housing prices (Zietz et al. 2008). However, when the representative consumer assumption is removed to analyse the market with heterogeneous households, it is likely that housing attributes are valued quite differently across the conditional price distribution. As we briefly review in the next section, several studies have provided empirical evidence that confirms these differences.

A further aspect that is worth mentioning is that the theoretical model sketched in this section naturally leads to a nonlinear hedonic price structure (Malpezzi 2002). This means that the marginal willingness to pay for a given characteristic of the house is not constant (see Fritsch et al. 2016 and references therein). As shown by Freeman (1993), the curvature of the hedonic price function could be convex, concave or linear and it is generally accepted that the hedonic price function is nonlinear (Kostov 2009). In our empirical application, we address this issue by using splines and polynomial terms for modelling nonlinearities.

2.2 The use of QR-based models for the house price function

As mentioned in the previous section, quantile regression has been used for a long time in modelling house prices to capture consumer heterogeneity in housing demand or identify housing submarkets (Amédée-Manesme et al. 2017). The heterogeneity of household preferences implies that the estimated implicit prices of housing characteristics for different points of the house price distribution may not be constant. Quantile regression allows one to determine the extent to which housing characteristics are valued differently across the distribution of housing prices (Mak et al. 2010). The variation across the price distribution is referred as vertical market segmentation. The price surface has irregularly distributed spatial subcentres, which is referred to horizontal housing market segmentation (Fritsch et al. 2016).

Bayer et al. (2004) found that the marginal willingness to pay for desirable housing characteristics and neighbourhood amenities increases with income and that the housing preferences of poor and wealthy households differ (Leung and Tsang 2012). Uematsu et al. (2013) employed a quantile regression approach to investigate the potentially heterogeneous impact of natural amenities on farmland values in the USA. Including regional dummies in the model specification allow for the estimation of the differences in farmland values across regions. Chasco and Le Gallo (2015) and Chasco and Sánchez (2015) evaluated the impact of air pollution and urban noise. Huang (2018) focused on schools, whilst Diao et al. (2018) examined at the effect of rail infrastructure. Diao et al. (2018) estimated the effect of public transport on housing prices. The findings of such studies show important variations in the willingness to pay for better conditions in these amenities. Waltl (2019) combined penalised quantile regression models with the hedonic imputation approach to construct house price indices.

The spatial dimension plays a central role when one wants to address house price dynamics. The spatial component has been introduced only recently in quantile modelling of house prices (Trzpiot 2012; McMillen 2012). An increasing number of studies have used spatial econometrics to control for spatial dependence and spatial heterogeneity (Wan et al. 2017). For example, Kostov (2009) applied a spatial lag quantile regression to a hedonic land prices model. This allows for varying effects of the hedonic characteristics and varying degrees of spatial lag autocorrelation. McMillen (2012, 2015) used a conditionally parametric quantile model accounting for local variation in an overall spatial trend. The advantage of this model is that it is computationally feasible for quite larger datasets. Moreover, the author showed a series of graphs that make easy to illustrate the effects of discrete changes in the explanatory variables on the distribution of the dependent variable. Fritsch et al. (2016) incorporated a semiparametric approach into the quantile regression framework to flexibly account for nonlinear covariate effects when studying the rental housing market in the German city of Regensburg. Wan et al. (2017) and Tomal and Helbich (2022) proposed space varying coefficient quantile regressions to examine the heterogeneity of the marginal effects of attributes across the distribution of housing prices. This approach which allows the coefficients to vary with a variable not included in the linear predictor, permits nonlinear interactions between this effect modifier and the other covariates, providing a flexible tool to investigate price heterogeneity. In the latter paper, a spatial autoregressive geographically weighted quantile regression was proposed to explore housing rent determinants in Amsterdam and Warsaw, showing that housing rent determinants vary over space and the price distribution.

3 Data description

The data come from different sources and are combined into a unique dataset to take advantage of the geocoding of any single data source.

Housing market data are from the Real Estate Observatory (Osservatorio del Mercato Immobiliare). The dataset is composed of 4000 individual housing transactions in Milan that occurred between 2004 and 2010. In addition to housing market values, the dataset provides information about the main characteristics of the house units recorded in the sample and discussed below. Housing units in the sample are spatially identified by their civic address. Each civic address is geocoded by its UTM coordinates using Java script to retrieve this information from Google Maps geographical databases. This allows us to add geocoded data on urban amenities to the housing transaction dataset. Urban amenity variables are taken from the open data portal of the municipality of Milan and the Regional Environmental Protection Agency (ARPA) of the Lombardy region. In particular, we consider the availability of public transport, education, cultural activities and related infrastructures and the presence of abandoned areas. Finally, to control for the effect of the financial crisis of 2008 on the housing market, a binary variable that identifies all the transactions that occurred before and after this year has also been defined. More specifically, this variable is equal to 1 if the housing unit has been sold in the postcrisis sample period (2009 onwards), and it is equal to 0 otherwise. The list, description and definition of the variables used in the empirical analysis are given in Table 1. Descriptive statistics for the price and explanatory variables are provided in Table 2. This dataset description is completed with a few additional comments on some of the variables hereinafter.

Table 1 Variables description

Full size table

3.1 Housing-specific characteristics

We consider the following housing-specific attributes: the total floor area, floor level, presence of a second bathroom or more, presence of an elevator, whether the housing unit has an independent heating system, the presence of a garage and the age of the building. Regarding the floor level and the building’s age, we adopt the same coding suggested by Michelangeli and Zanardi (2009), namely, the floor level is divided into three levels (the house is the on ground floor or first floor; the house is on the second floor or third floor; the house is on the fourth floor or higher); the building’s age is divided into two levels according to whether the unit was built before or after 1950.

3.2 Urban amenities

Culture. Cultural amenities and related infrastructure are measured by the Cultural Catalyst developed by Borgoni et al. (2018b). It is a composite indicator of the following cultural amenities: theatres, museums, libraries and auditoria. The Cultural Catalyst is obtained in two steps: in the first step, an accessibility index for the four cultural amenities is constructed according to the following equation:

$$\begin{aligned} {{\tilde{v}}}_j=\sum ^{S_j}_{s=1}{w_{js}e^{(-\gamma d_{hjs})}}, \end{aligned}$$

where ${{\tilde{v}}}_j$ is the variable measuring the accessibility to amenity $v_j$; $S_j$ is the total number of locations of amenity $v_j$, $w_{js}$ is the weight associated with $v_j$ constantly set equal to 1 (i.e. ${{\tilde{v}}}_j$ is a weighted total of the amenity in the study areas); $e^{\mathrm {(-}\gamma d_{hjs}\mathrm {)}}$ is a distance-decay function; $d_{hjs}$ is the Euclidean distance (in metres) between housing unit h and site s where amenity $v_j$ is located; the parameter $\gamma$ is selected via cross-validation, as suggested by Borgoni et al. (2018b). In the second step, a principal component (PC) analysis is computed via a single-value decomposition of the correlation matrix of the four accessibility variables. Only the largest eigenvalue is found to be significantly larger than 1; hence, the first PC defines the Cultural Catalyst, and the first PC scores are the sampling values of the index.

Public transport Accessibility to the dwelling is accounted for by the distance to the nearest metro station from each housing unit. A georeferenced map of the metro stations is available from the open data portal of the municipality of Milan. A binary accessibility index is calculated for each housing unit according to whether this distance is larger than the third quartile of all the distances associated with each dwelling in the sample.

Education As in Brambilla et al. (2013); Garretsen and Marlet (2017), universities are considered a proxy for education.^{Footnote 1} There are 710 university sites in the municipality of Milan, spread across the municipality area, belonging to seven main institutions and four academies of arts and design. It is assumed that a potential dweller considers the proximity to a specific higher education institution that he is interested in rather than to a variety of different institutions when making a residence choice; thus, we construct a proximity index to capture this effect. More specifically, we first geolocate all the university sites and then, calculate the distance between each sample unit and its nearest university site. Finally, to better identify those sites more exposed to the effect of university proximity from the others, we derive a binary index. A value of one if taken if such distance is in the bottom 25% of all the distances calculated for the sample units and 0 otherwise.

Urban slum areas It is expected that the presence of ruined or degraded buildings or slum areas may negatively impact on the prices of nearby houses. Information on the location of abandoned buildings and areas (private, productive or natural) is available from the open data portal of the municipality of Milan, which provides a shape file reporting the UTM coordinates of those sites. To calculate a dismissed area index, the Euclidean distance from each house in the sample and each abandoned site is calculated, and a dummy variable indicating whether at least one ruined site is present within a distance of 200 ms from each sample unit is constructed to account for the potential impact of neighbouring degradation.

3.3 Preliminary analysis

Table 2 shows some summary statistics of the variables described above, and it also describes how the house price conditional distribution changes according to their levels. For instance, the difference between the first sample quartile of the prices of housing units with one bathroom and the first sample quartile of housing units with two or more bathrooms is €4,305, whereas this difference is €14,729 in the third quartile. In percentage terms, the first sample quartile of housing prices with two or more bathrooms is 83% higher than the same quartile of one-bathroom houses; this spread increases to 168% in the third quartile. Looking at the parking area, the difference between the first quartile of housing prices of units with and without a parking area is €2,902, and it increases to €7,575 in the third sample quartile. This suggests that having two or more bathrooms or a parking area has a much larger impact for high-valued houses than for less expensive houses. Similar patterns are found for other variables in Table 2.

Table 2 Housing prices (in euros) summary statistics by dwelling characteristics

Full size table

From this preliminary analysis, a quantile-like approach seems more appropriate than ordinary multiple regression to account for possible variations in the implicit prices along the house price distribution.

To assess the need for using a robust approach, the standard semiparametric linear model below has been preliminary estimated:

$$\begin{aligned} \text {E}\left( y|{\varvec{x}}, t, {\textbf{s}}_i\right) ={\varvec{x}}'{\varvec{\beta }}+f_{1}(t) + f_{2}({\textbf{s}}_i). \end{aligned}$$

(1)

$f_{1}(\cdot )$ and $f_{2}(\cdot )$ are nonlinear functions represented by spline terms that will be discussed in detail in Sect. 4; ${\textbf{s}}_i \in {\mathbb {R}}^2$ are the geographical coordinates of unit i; ${\varvec{x}}$ contains the set of variables listed in Table 1. The actual specification of each variable included in the additive predictor is clarified ins Sect. 5. t is the Cultural Catalyst that is expected to impact Y nonlinearly.

We show in Fig. 1(a) the plot and in Fig. 1(b) the normal probability plot of standardised residuals of the model above. These two plots indicate that the normality assumption adopted in standard semiparametric modelling does not hold. This is confirmed by the Shapiro test, for which the null hypothesis of normality is rejected ($\text {p-}value\simeq 0$). Moreover, looking at the plots, outliers are easily detectable. The proportion of outliers, i.e. standardised residuals greater than $\pm 2$, is approximately $4\%$.

To evaluate the spatial dependence in the data, we report the spatial pattern of the house price Cultural Catalyst obtained by smoothing the observed values from the sampling locations via inverse distance weighted interpolation in Fig. 2(a). The map shows that a well-defined spatial structure and larger values are expected to occur towards the city centre. Figure 2(b) shows that the empirical variogram of the residuals of the semiparametric linear model described above. The variogram generally appears to be constant when considered at different distances (a situation known as a pure nugget in geostatistics), suggesting that the residuals do not show any spatial dependence once the effect of regionalised variables as well as the impact of the spatial trend has been taken into account.

We also consider the residuals obtained from a regression model where prices are taken on the log scale. Additionally, in this case, the normality assumption does not hold (the value of the Shapiro test is equal to $W = 0.672$ with $\text {p-}value\simeq 0$) and the percentage of outlying observations remains substantially unchanged. This suggests that the log transformation, ubiquitously adopted in hedonic price analysis, is not appropriate to compensate for the presence of outlying observations or the lack of Gaussianity of the price data.

Robust estimation has been suggested in several papers (see Huggins 1993; Huggins and Loesch 1998) to address the non-normality of the dependent variable. This is achieved using a loss function in the log-likelihood that increases with the regression residuals at a slower rate than the squared loss function. As will be shown in the next section, the M-quantile approach provides a robust and efficient estimator of the hedonic price function without assuming any specific probabilistic model for the data at hand.

Finally, we consider the potential nonlinear impact of the Cultural Catalyst. The need to include nonlinear effects in house price modelling has been discussed previously (see Sect. 2). Figure 3 shows the plot of the residuals of the M-quantile model of order of 0.5 (see Sect. 4 below for details) where the Cultural Catalyst has been removed from the linear predictor. The model residuals are scattered towards the Cultural Catalyst values in the plot, showing that a nonlinear pattern remains with respect to their variability. This suggests adding the spline term in the linear predictor, as discussed above.

4 The spatial semiparametric M-quantile model

M-quantile regression (Breckling and Chambers 1988) is a ‘quantile-like’ generalisation of regression based on influence functions (M-regression) and can be used to understand the differential effect of a covariate at different levels of the conditional distribution of the response variable. The M-quantile of order q for the conditional density of Y given the set of covariates x,$\ f\left( y | {\varvec{x}}\right)$, is defined as the solution ${MQ}_Y\left( q |{\varvec{x}},\psi \right)$ of the integral equation $\int {{\psi }_q\left[ y-{MQ}_y\left( q | {\varvec{x}},\psi \right) \right] f\left( y | {\varvec{x}}\right) dy=0}$ where ${\psi }_q$ denotes an asymmetric influence function. Given x, the linear M-quantile regression model is defined by ${MQ}_y\left( q | {\varvec{x}},\psi \right) )={\varvec{x}}'\varvec{\beta }$ where $\varvec{\beta }$ represents a vector of unknown parameters. The set x includes a range of variables representing housing-specific characteristics and urban amenities described in detail in Sect. 3. Throughout the paper, the influence function is obtained as the derivative of the Huber loss function ${\rho }_q\left( r\right)$ (Huber 2011), which is defined as follows:

$$\begin{aligned} {\rho }_q\left( r_{iq}\right) =\left\{ \begin{array}{c} 2c\left| r_{iq}\right| -c^2\left\{ qI\left( y>0\right) +\left( 1-q\right) I\left( y\le 0\right) \right\} \qquad |r_{iq}|>c \\ r^2_{iq}\left\{ qI\left( y>0\right) +\left( 1-q\right) I\left( y\le 0\right) \right\} \qquad \left| r_{iq}\right| \le c \end{array} \right. \end{aligned}$$

(2)

where I(A) is the indicator function of set A, and c is an appropriate tuning constant. Conventionally, in M-regression, the tuning constant is suitably selected to provide a trade-off between robustness and efficiency. Huber (2011) suggested that ‘good choices are in the range between 1 and 2’. The default value for c is 1.345, which guarantees 95% efficiency of the estimators under normality and still offers protection against outliers. This value is also used in the rest of the paper. Note that different sets of regressors can be included in the linear predictor at different M-quantiles and that a wide range of models can be obtained by modifying the influence function and/or the tuning constant. For instance, using a square loss function, the linear expectile regression model is obtained if $q\ne 0.5$ (Newey and Powell 1987), whereas setting $q\ =\ 0.5$ produces the standard linear regression model. Defining the loss function to be the absolute value function described by Koenker and Bassett (1978) gives the linear quantile regression model. Hence, the approach suggested in this paper provides a very flexible tool for analysing housing market prices. We include a semiparametric component for the cultural catalysis in the model to account for its potential nonlinear effect, which is expected, as discussed by Borgoni et al. (2018a). We also include a smooth bivariate function to capture the spatial trends of the data.

The spatial semiparametric model at the M-quantile q (SSPMQ hereafter) is now given as follows:

$$\begin{aligned} MQ_Y\left( {\varvec{x}}, t, {\textbf{s}}_i;\psi \right) ={\varvec{x}}'{\varvec{\beta }}_q+f_{1q}(t) + f_{2q}({\textbf{s}}_i), \end{aligned}$$

(3)

where $f_{1q}(\cdot )$ and $f_{2q}(\cdot )$ represent two unknown arbitrary smooth functions. ${\textbf{s}}_i \in {\mathbb {R}}^2$ represents that the geographical coordinates of unit i and t are the Cultural Catalyst.

In the rest of this paper, $f_{1q}(\cdot )$ is a penalised spline that relies on a set of univariate quadratic basis functions, i.e.

$$\begin{aligned} f_{1q}(t) = \sum _{j=1}^{K_1} b_{1j}(t)\theta _{1jq} \end{aligned}$$

(4)

where $(b_{1j}(t), j=1, \ldots K_1)$ and $\theta _{1jq}$ are the basis function and a M-quantile specific spline coefficients set, respectively. In vector form, the spline is written as

$$\begin{aligned} f_{1q}(t)={{\varvec{x}}}'_{{\varvec{t}}}{\varvec{\beta }}_{1q}+{{\varvec{z}}'}_1{\varvec{\gamma }}_{1q}, \end{aligned}$$

(5)

where ${{\varvec{x}}}'_{{\varvec{t}}}=\left[ 1,t,t^2\right]$, ${{\varvec{z}}}_1=\left[ \left( t-k_j\right) _+^2: j=1, \ldots ,K_1\right]$ with $(x^2)_+$ denotes the function $x^2I\{x>0\}$, $I\{x>0\}$ being the indicator function of the set $x>0$, $k_j$ is the j-th knot of the spline and $K_1$ is the number of spline knots.

In Equation (3), the function

$$\begin{aligned} f_{2q}({\textbf{s}}) = \sum _{j=1}^{K_2} b_{2j}({\textbf{s}})\theta _{2jq} \end{aligned}$$

(6)

is a M-quantile specific bivariate thin plate spline that accounts for the spatial trends in prices; $(b_{2j}({\textbf{s}}), j=1, \ldots K_2)$ and $\theta _{2jq}$ are the bivariate basis function and an M-quantile specific spline coefficients set, respectively. In vector form, the spline is specified as follows:

$$\begin{aligned} f_{2q}({\textbf{s}})= {\varvec{x}}_{\textbf{s}}' {\varvec{\beta }}_{2q} + {\varvec{z}}'_2 {\varvec{\gamma }}_{2q}, \end{aligned}$$

(7)

where ${\varvec{x}}_{\textbf{s}}'=\left[ 1, s_1, s_2 \right]$; $K_2$ is the number of spline knots; ${\varvec{z}}'_2$ is a row of the $n \times K_2$ spline matrix ${{\textbf{Z}}}_{sp}$, and ${\varvec{\gamma }}_{2q}$ is a $K_2$-column vector of M-quantile specific spline coefficients. The bivariate spline matrix is defined (Opsomer et al. 2008) by:

$$\begin{aligned} {{\textbf{Z}}}_{sp}= \left[ C({\textbf{s}}_i-{\textbf{k}}_j) \right] _{1 \le j \le K_2}^{1 \le i \le n} \left[ C({\textbf{k}}_j-{\textbf{k}}_k) \right] _{1 \le j, k \le K_2}^{-1/2} \end{aligned}$$

(8)

where ${\textbf{k}}_j$ and ${\textbf{k}}_k$, $j,k =1, \ldots , K_2$, are two-dimensional vectors representing the cartographic coordinates of knots j and k; ${\textbf{s}}_i$ is a two-dimensional vector representing the cartographic coordinates of sampling location i; $C(\mathbb {{\textbf{s}}})=\left\| {\textbf{s}}\right\| _2^2 \log \left\| {\textbf{s}}\right\| _2$, where ${\textbf{s}}\in {\mathbb {R}}^2$; $\left\| {\textbf{s}}\right\| _2$ is the Euclidean norm of ${\textbf{s}}$ in ${\mathbb {R}}^2$.

In matrix notation, the spline terms in Equation (4) and (6) are ${\textbf{f}}_{hq} = \mathbf {B_h}\varvec{\theta }_{hq}$ $h=1;2$, where $\mathbf {f_{1q}} = [f_{1q}(t_1)\ldots f_{1q}(t_n)]^T$ and $\mathbf {f_{2q}} = [f_{2q}({\textbf{s}}_1)\ldots f_{2q}({\textbf{s}}_n)]^T$; $\varvec{\theta }^T_{hq}=({\varvec{\beta }}_{hq}^T,{\varvec{\gamma }}_{hq}^T)$ is the $q^{th}$ M-quantile specific vector of coefficients used in the linear combination, and $\mathbf {B_h}$ is the spline basis regression matrix.

The smooth terms $f_{hq}(t)$, $h=1;2$ in Equation (3) introduce an identification problem (Wood 2017). To address this problem, we define a column centred matrix: ${\tilde{\textbf{B}}_h}=\mathbf {B_h} - {\textbf{1}}{\textbf{1}}^T\mathbf {B_h}/n$, calculate ${\tilde{\textbf{f}}_{hq}} = {\tilde{\textbf{B}}_h}\varvec{\theta }_{hq}$ and use ${\tilde{\textbf{f}}_{hq}}$ in the semiparametric linear predictor. To simplify the notation, we use $\mathbf {f_{hq}}$ instead of ${\tilde{\textbf{f}}_{hq}}$ in the rest of the paper. The nodes of the splines are determined by the cluster separation method clara, which is implemented in the R software (R Core Team 2020) and applied to the sample values of the geographical coordinates.

The SSPMQ model is estimated via the penalised least squares by solving the following estimation equations (Pratesi et al. 2009):

$$\begin{aligned} \sum _{i=1}^{n} \psi _q(y_i -{\textbf{z}}^T _i \varvec{\eta }_{q}) {\textbf{z}}^T_i + \uplambda _{1q} {\textbf{D}}_1 \varvec{\eta }_{q} + \uplambda _{2q}{\textbf{D}}_2\varvec{\eta }_{q}= {\textbf{0}}, \end{aligned}$$

(9)

where $\varvec{\eta }_{q}=({\varvec{\beta }}^T_{q}, {\varvec{\theta }}^T_{1q}, {\varvec{\theta }}^T_{2q})^T$, ${\textbf{z}}^T_i = ({\varvec{x}}_i, {\textbf{b}}_1(t_i),{\textbf{b}}_2({\textbf{s}}_i)$), ${\textbf{D}}_1$ and ${\textbf{D}}_2$ are two penalty matrices and $\uplambda _{1q}$, and $\uplambda _{2q}$ are the smoothing parameters estimated via external cross-validation. In particular, the Generalised Cross-Validation (GCV) to be minimised to obtain $\Lambda _q=(\uplambda _{1q},\uplambda _{2q})$ is:

$$\begin{aligned} GCV(\Lambda _q) = \dfrac{||({\textbf{I}}-{\varvec{S}}_{\Lambda _q}){\textbf{y}}||^2}{(1-n^{-1}\delta \textrm{tr}({\varvec{S}}_{\Lambda _q}))^2}, \end{aligned}$$

(10)

where ${\varvec{S}}_{\Lambda _q}$ is a smoother-type matrix associated with $MQ_Y\left( {\varvec{x}}, t, {\textbf{s}}_i;\psi \right)$, and $\delta$ is a penalisation term for the additional degrees of freedom given by the trace of the smoother matrix (Pratesi et al. 2009).

The estimation procedure is as follows:

1.
Select an initial value of $\varvec{\eta }_{q}$.
2.
At each iteration step r, calculate the residuals $e_{iq}^{r-1} = y_i -{\textbf{z}}^T _i \varvec{\eta }_{q}$ and the associated weights $\alpha _{iq}^{r-1} = \psi _q(e_{iq}^{r-1})/e_{iq}^{r-1}$.
3.
Optimise the GCV($\Lambda _q$) over a fine grid of values of $\Lambda _q$ to obtain $\Lambda _q^\star =(\uplambda ^\star _{1q},\uplambda ^\star _{2q})$.
4.
Calculate the new weighted penalised least squares estimates as follows:
$$\begin{aligned} \varvec{{\widehat{\eta }} }_{q}^{T} = [{{\textbf{Z}}}{\textbf{A}}^{r-1}{{\textbf{Z}}}^T + \uplambda ^\star _{1q} {\textbf{D}}_1 + \uplambda ^\star _{2q}{\textbf{D}}_2]^{-1}{{\textbf{Z}}}^{T}{\textbf{A}}^{r-1}{\textbf{y}}, \end{aligned}$$
(11)
where ${{\textbf{Z}}}= \{{\textbf{z}}_i\}_{i=1,\ldots ,n}$ and ${\textbf{A}}^{r-1}$ is a diagonal matrix of weights with diagonal element $\alpha _{iq}^{r-1}$.
5.
Iterate steps 1-4 until convergence.

This procedure above is implemented in R software (R Core Team 2020), and the R functions are available from the authors upon request.

From Equation (11) and using simple algebraic manipulation, the variance-covariance matrix of the estimated coefficients of the semiparametric M-quantile regression model may be estimated by:

$$\begin{aligned} \textrm{var}(\varvec{{\widehat{\eta }}}_{q}) = ({{\textbf{Z}}}^T {\textbf{A}} {{\textbf{Z}}}+ \uplambda _{1q}{\textbf{D}}_1 + \uplambda _{2q}{\textbf{D}}_2)^{-1} {{\textbf{Z}}}^T {\textbf{A}} {{\textbf{Z}}}({{\textbf{Z}}}^T {\textbf{A}} {{\textbf{Z}}}+ \uplambda _{1q}{\textbf{D}}_1 + \uplambda _{2q}{\textbf{D}}_2)^{-1} \sigma ^2. \end{aligned}$$

(12)

An estimate of this variance-covariance matrix can be obtained by plugging in the sample estimates of $\varvec{\eta }_q$, of the error variance $\sigma ^2$ and the final values of the smoothing parameters. An estimate of $\sigma ^2$ can be obtained by using the minimum absolute deviation method and is given by ${\hat{\sigma }}^2=(\text {median}(|y_i -{\textbf{z}}^T _i\varvec{{\widehat{\eta }}}_{q}|)/0.6745)^2$ (Chambers and Tzavidis 2006). The asymptotic theory of the M-quantile coefficients estimators for the nonparametric M-quantile has been discussed in Pratesi et al. (2009). Bianchi et al. (2018) showed that the M-quantile estimates can be obtained via maximum likelihood estimation using the Generalised Asymmetric Least Informative distributed error terms and the authors adapted the usual testing procedures to the M-quantile regression.

The variance in Equation (12) can be used to assess the statistical significance of the spline term by calculating a pointwise variability band around the curve and checking whether it includes the horizontal axis. The variability band is constructed using an approach similar to suggested by Ruppert et al. (2003) for ordinary semiparametric spline regression. Moving from the estimated version of Equation (5) and using simple algebra, we determine the variance-covariance matrix of the spline term as follows:

$$\begin{aligned} \mathrm {{\widehat{var}}}({\hat{f}}_{1q}(t))=({{\varvec{x}}}_{{\varvec{t}}},{{\varvec{z}}}_1) \textrm{var}( \widehat{\varvec{\theta }}_{1q} )({{\varvec{x}}}_{{\varvec{t}}},{{\varvec{z}}}_1)^T. \end{aligned}$$

(13)

The pointwise variability band is given by ${\hat{f}}_{1q}(t) \pm 2 \cdot \sqrt{\mathrm {{\widehat{var}}}({\hat{f}}_{1q}(t))}.$

5 Modelling housing prices in Milan using semiparametric M-quantile regression

This section presents the regression results. The specification of the SSPMQ model described in Sect. 4 includes housing-specific covariates as well as urban amenities discussed in Sect. 3.^{Footnote 2}

Table 3 shows the estimated coefficients for each quantile, when the penalization term $\delta =3$ in Equation (10).^{Footnote 3} We set the number of knots $K_1$ equal to 20 for $f_{1q}$ and $K_2=40$ for $f_{2q}$; the knots are located in the plane using the clara algorithm implemented in R software. We test the impact of using a different number of knots and have found that the results tend to be very stable. Note that a general rule of thumb is to place one knot every 4 or 5 observations. However, for large datasets, this can lead to an excessive number of knots (and therefore parameters) making the computational burden extremely heavy. Therefore, a maximum number of allowable knots may be recommended. In any case, the number of knots does not seem crucial for penalised regression splines once a certain minimum value is reached (Pratesi et al. 2009; Ruppert et al. 2003).

Table 3 Results of SSPMQ ($q=0.10,0.25, 0.50, 0.75, 0.90$) estimated using 20 knots for $f_{1q}$ and 40 knots for $f_{2q}$ - $\delta =3\dag$

Full size table

All covariates act in an a priori predictable manner. Most of them are statistically significant and are valued differently at different points of the conditional distribution of house prices. This is also clearly displayed in Fig. 4, which shows the effect of each variable at different M-quantiles. Grey bands represent the 95% confidence interval of the parameter of interest. We have also added a blue line to each graph to represent the regression coefficients obtained by fitting the standard semiparametric additive model on the mean specified in Equation (1). This allows us to assess whether and for which variables there is a heterogeneous effect on the distribution of response.

Some housing-specific attributes have a different impact at different levels of the price distribution, suggesting that their value is different at different points of the housing price distribution. The floor at which the house is located has been found to be statistically significant only for very low and medium-value houses. A nonconstant effect of floor level along the price distribution has also been found by Amédée-Manesme et al. (2017) in Paris: the higher the price category is, the lower the premium assigned to a given floor level. In contrast, our analysis suggests that a medium floor (second or third floor) is valued lower than a high floor (fourth floor or above) or a low floor (ground or first floor) at the lowest M-quantiles. The presence of an elevator has a positive and significant effect that remains quite stable across M-quantiles. In line with Michelangeli and Zanardi (2009), we find that the age of the building has a positive effect on housing prices. Moreover, this effect is more pronounced at the top of the housing price distribution. This can be explained by the fact that the oldest buildings tend to be located in the most elegant districts of the city centre; moreover, many of them are interesting from an architectural point of view and/or have beautiful gardens inside the court. Therefore, it is reasonable for the price differential to be higher for very high-valued houses. The presence of a heating system does not significantly affect the price distribution. The same results have also been found by Brambilla et al. (2013) using a standard model on the mean. Other housing-specific attributes, such as the presence of an elevator and more than one bathroom in the housing unit, have a positive and significant effect that remains quite stable across M-quantiles.

Some patterns of heterogeneity have also been found for the environmental characteristics. Neighbouring degradation (measured by the “abandoned area” variable) contributes negatively to the house price. In particular, it more importantly affects the price of average and high-valued houses. In contrast, it has been found not statistically significant for very low-valued houses ($q=0.1$). This is surprising, “environmental quality is very much like leisure time: as people become wealthier, they demand more of it, mostly because they can better afford it” (Boudreaux 2008). the proximity to a university follows an inverted U-curve that is higher at the centre of the outcome distribution. This result appears reasonable: typically, luxury properties are less likely to be of interest to students, who largely represent the demand for housing close to universities. Quite surprisingly, we have found that proximity to a metro station negatively significantly affect the housing price at $q=0.5$ and $q=0.75$. This result reveals that the nuisance and congestion created by the station are not always compensated by the benefit arising from direct access to public transport. The global economic crisis in 2008 had a significant detrimental effect on the Milan housing market only for low- or average-valued houses. In contrast, it did not significantly affect the price of high-valued houses that were actually less impacted by the negative shock of the crisis. Finally, we note that for many of the variables considered in our model the impact at different M-quantiles differs from the impact they have on the mean (blue line in the graphs), which often also lies outside the $95\%$ confidence interval. This fosters the idea that there exists a remarkable heterogeneity in housing demand and prices due to the structural and environmental characteristics of the property.

The estimated effect of housing size at the five M-quantiles $q= 0.1, 0.25, 0.5, 0.75$ is reported in Fig. 5. The curves in Fig. 5 have been constructed considering a reference housing unit located at the barycentre of the municipality with a value of the Cultural Catalyst to 0.6, sold in 2008 or later, with two or more bathrooms, a heating system that is not autonomous and a parking area or a garage. This reference unit has been assumed to be located on a medium floor (second or third floor), in a building built after 1950 with at least one elevator, far from an abandoned site, the metro station and a university site. The plot clearly shows that the impact of housing area on prices tends to be similar at different M-quantiles. However, the estimated curve for the higher M-quantile becomes concave for larger houses suggesting that buyers of more luxurious units attribute a progressively lower value to the house size.

The spline effect $(f_{1q})$ of the Cultural Catalyst at the five M-quantiles $q= 0.1,0.25, 0.5, 0.75, 0.9$ is depicted in Fig. 6. The curves in panels (a) and (b) have been constructed considering the same reference housing unit used for Fig. 5 whilst setting the value of (standardised) housing area as large as 0.5. It is worth mentioning that the additive nature of the model implies that the specific housing unit considered as reference does not influence the spline shape with the exception of its intercept. Figure 6(a) shows a nearly linear or slightly concave shape for the considered M-quantiles. However, a clear concave shape is observed for $q=0.9$. This seems to suggest that, similar to environmental quality (Boudreaux 2008), people are willing to pay more for culture when they become richer, probably because they can better afford it. However, this higher willingness to pay for culture increases at a decreasing rate.

Figure 7 displays the spline effect of the Cultural Catalyst at $q=0.25, 0.50, 075$ with the variability band of the curve, as defined in Sect. 4, represented by the coloured area. This band is largely above zero suggesting a significant positive effect of cultural amenities on housing prices at all the considered M-quantiles. Furthermore, envelops at different M-quantiles are clearly separated, suggesting that the effect of the Cultural Catalyst is significantly different at different levels of the price distribution.

Figure 8 shows the spatial dynamics of prices estimated by the bivariate thin plate component at the five considered M-quantiles. It is not surprising that higher prices tend to concentrate in the central area of the city and decrease when moving to the outskirts regardless of whether the house is a low, medium or high-valued unit.

To evaluate the goodness of fit of our model, we calculate the pseudo-$R_{\rho }^2$ goodness-of-fit measure proposed by Bianchi et al. (2018): $R^2_\rho (q)=1-\dfrac{\sum _{i=1}^{n}\rho _q(e_{iq})}{\sum _{i=1}^{n}\rho _q({\tilde{e}}_{iq})}$ for several M-quantiles. In the previous equation, $e_{iq}$ are the scaled residuals under the full model, ${\tilde{e}}_{iq}$ are the scaled residuals under the null model (i.e. the model in which all the coefficients except for the intercept are set to zero), q is the M-quantile order, and $\rho$ is the loss function defined in Equation (2). For all the considered M-quantile models, namely for $q=0.1, 0.25, 0.5, 0.75$ and 0.9, we find that the M-quantile regression performs reasonably well (with $R^2_\rho (q)$ ranging from $40\%$ to $75\%$) and that the $R^2_\rho$ increases with the quantile order. The pseudo-$R^2_\rho (q)$ values associated with the M-quantiles models considered above are larger than the pseudo-$R^2_\rho (q)$ of the corresponding quantile regressions (the results of the latter models have not been reported in detail here), suggesting that the proposed approach is more appropriate than quantile regression for the data at hand.

Finally, it is worth showing how the implicit price associated with the Cultural Catalyst may be determined for any desired M-quantile. Let t be the value of the Cultural Catalyst and let $P_q(t)$ be the q-th M-quantile of the price distribution as a function of t, assuming that all of the other covariates in Equation (3) are fixed. To calculate the implicit price at any value t, it is necessary to estimate the derivative of $P_q(t)$ at t for every M-quantile of interest (see Sect. 2). By differentiating Equation (3) with respect to t, we obtain:

$$\begin{aligned} P_q'\left( t\right) =f'_{1q}(t) \end{aligned}$$

where $f'_{1q}(t)$ is the derivative of the spline transformation at t.

From Equation (5), we find that an estimate of $f'_{1q}(t)$ can be obtained as follows:

$$\begin{aligned} {{\hat{f}}'}_{q1}(t)=\ {\dot{{\varvec{x}}}'}_t{\widehat{\varvec{\beta }}}_{1q}+{\dot{{\varvec{z}}}'}_{sp}{\widehat{\varvec{\gamma }}}_{1q} \end{aligned}$$

(14)

where ${\dot{{\varvec{x}}}'}_t=\left[ 0,\ 1,\ 2t\right]$, ${\dot{{\varvec{z}}}'}_{sp}={\left[ 2\left( t-k_j\right) _+: j=1,\cdots ,K_1\right] }$ and, as above, $k_j$ is the j-th knot of the spline and $\widehat{\varvec{\beta }}_{1q}$ and $\widehat{\varvec{\gamma }}_{1q}$ are the estimates of the regression coefficients associated with the spline basis.

Using Equation (14), the implicit price has been calculated for M-quantiles 0.10, 0.25, 0.5, 0.75 and 0.90 at the median sample value of the Cultural Catalyst for the reference housing unit described before. Assuming that the amount of t changes by a small quantity dt, say 0.01, the increase in the price per square metre is approximately €38.8 for $q=0.10$, €49.1 for $q=0.25$, €63.6 for $q=0.50$, €76.3 for $q=0.75$ and €86.2 for $q=0.90$, suggesting that households with low-priced properties behave differently than households with high-priced housing in terms of the marginal willingness to pay for culture. The latter households attribute a greater value to a marginal increase in cultural amenities.

6 Conclusions

The aim of this paper was to employ an M-quantile approach to examine how the effect of housing characteristics may vary across the conditional distribution of house prices, preserving the robustness and efficiency of the estimators of the regression parameters. More specifically, a semiparametric M-quantile regression was proposed for the residential housing market of Milan. We included a spline component in the model to estimate the potential nonlinear effect of the Cultural Catalyst, an index for cultural amenities. We also considered other urban amenities, such as the presence of abandoned sites, metro stations and university sites. Our findings suggest that several housing attributes differ significantly across the response distribution, supporting the choice of estimating the conditional M-quantile functions in addition to the conditional mean (Liao and Wang 2012). High-income residents are more concerned about the environmental quality and are willing to pay a higher price for an improvement of the context where the unit is located. Similarly, cultural amenities have a stronger positive effect on high-valued houses. At the top of the distribution of prices, the impact of cultural amenities increases more than linearly as its quantity increases. These results suggest that people tend to demand more cultural amenities and environmental quality as they become wealthier, mostly because they can better afford them. The proximity to university sites has been found to significantly increase the price of housing for low and average value houses where the effect on price distribution is fairly stable, whereas the impact is negligible on high-value units. The latter units, in fact, are presumably less interesting for students, who largely represent the demand for housing near a university site. Some specific attributes of the house, such as its size and the presence of a lift or of a parking area are much more valuable for high-value units although their impact has also been found to be statistically significant at a lower level of the price distribution.

The nonparametric nature of the proposed approach permits one to avoid the ubiquitous log transformation of the prices in regression analysis. The log transformation has important drawbacks when, as in the present paper, one is interested in predicting the value of implicit prices of amenities and housing-specific attributes, since it is necessary to back transform the log-prices on the raw scale. The simple exponentiation of the predicted log price provides naive estimates of the implicit prices biased downwards. The first way to adjust the bias is to assume a log-normal distribution of the residuals. This assumption is often difficult to test and indeed it is rarely tested in this strand of literature. A second method is to use transformation bias correction discussed by Chambers and Clark (2012). On the one hand, this correction does not require any particular distribution model. On the other hand, it requires calibrating the naive estimates of the implicit prices by a data-based factor that reduces but does not eliminate the bias of the final estimates. We also note that log transformation is often used to mitigate the influence of extreme raw scale values. In our case, this does not occur since the percentage of outliers remains basically the same on the log scale. Moreover, log transformed data are susceptible to ‘small’ outliers (i.e. values close to zero). This again may induce an increased variability of parameter estimates on the log scale, and hence, it further justifies the M-quantile approach that downweights outliers. More generally the M-quantile approach has permitted housing prices to be modelled in a natural manner, avoiding strong assumptions and preserving the statistical efficiency of the estimators of the regression coefficients. Perhaps, this last point represents the main advantage of this approach. There is a sort of balance between robustness and efficiency of estimators through the tuning constant of the influence function (see Sect. 4). Moreover, the option to select several continuous influence functions in the M-quantile regression—in contrast to the absolute value function in the quantile regression—offers the opportunity to obtain additional computational stability.

Finally, the methodology employed in this paper has proven to be extremely flexible. We showed how it can be straightforwardly coupled with semiparametric specifications that allow one to take into account effects that are potentially nonlinear or that follow spatial dynamics. We also believe that this methodology has a potential wide range of applications in the residential housing market, from identifying housing submarkets to designing tax systems or financing local public goods, such as culture and better environment conditions.

Notes

Unfortunately, we do not have information on other variables for the quality of education, such as the percentage of pupils moving up to a higher class or parameters for classroom and/or building facilities.
To improve the numerical stability of the estimates, the Cultural Catalyst has been scaled between 0 and 1 in all the statistical analyses presented in the paper.
We consider different values of the penalisation term, and the estimated coefficients found to be stable. The results are available upon request.

References

Alfò, M., Salvati, N., Ranallli, M.G.: Finite mixtures of quantile and M-quantile regression models. Stat. Comput. 27(2), 547–570 (2017)
Article MathSciNet Google Scholar
Amédée-Manesme, C.O., Baroni, M., Barthélémy, F., et al.: Market heterogeneity and the determinants of Paris apartment prices: a quantile regression approach. Urban Stud. 54(14), 3260–3280 (2017)
Article Google Scholar
Bayer, P., McMillan, R., Rueben, K.: An equilibrium model of sorting in an urban housing market. Technical report, National Bureau of Economic Research (2004)
Bianchi, A., Fabrizi, E., Salvati, N., et al.: Estimation and testing in M-quantile regression with applications to small area estimation. Int. Stat. Rev. 86(3), 541–570 (2018)
Article MathSciNet Google Scholar
Borgoni, R., Del Bianco, P., Salvati, N., et al.: Modelling the distribution of health-related quality of life of advanced melanoma patients in a longitudinal multi-centre clinical trial using M-quantile random effects regression. Stat. Methods Med. Res. 27(2), 549–563 (2018)
Article MathSciNet PubMed Google Scholar
Borgoni, R., Michelangeli, A., Pontarollo, N.: The value of culture to urban housing markets. Reg. Stud. 52(12), 1672–1683 (2018)
Article Google Scholar
Borgoni, R., Degli Antoni, G., Faillo, M., et al.: Natives, immigrants and social cohesion: intra-city analysis combining the hedonic approach and a framed field experiment. Int. Rev. Appl. Econ. 33(5), 697–711 (2019)
Article Google Scholar
Boudreaux, D.: Globalization. Greenwood Press, Westport (2008)
Google Scholar
Brambilla, M., Michelangeli, A., Peluso, E.: Equity in the city: on measuring urban (ine) quality of life. Urban Stud. 50(16), 3205–3224 (2013)
Article Google Scholar
Breckling, J., Chambers, R.: M-quantiles. Biometrika 75(4), 761–771 (1988)
Article MathSciNet Google Scholar
Brunauer, W., Lang, S., Umlauf, N.: Modelling house prices using multilevel structured additive regression. Stat. Modell. 13(2), 95–123 (2013)
Article MathSciNet Google Scholar
Chambers, R., Clark, R.: An Introduction to Model-Based Survey Sampling with Applications. Oxford University Press, Oxford (2012)
Book Google Scholar
Chambers, R., Tzavidis, N.: M-quantile models for small area estimation. Biometrika 93(2), 255–268 (2006)
Article MathSciNet Google Scholar
Chambers, R., Salvati, N., Tzavidis, N.: Semiparametric small area estimation for binary outcomes with application to unemployment estimation for local authorities in the UK. J. R. Stat. Soc. Ser. A 179(2), 453–479 (2016)
Article MathSciNet Google Scholar
Chasco, C., Le Gallo, J.: Heterogeneity in perceptions of noise and air pollution: a spatial quantile approach on the city of Madrid. Spat. Econ. Anal. 10(3), 317–343 (2015)
Article Google Scholar
Chasco, C., Sánchez, B.: Valuation of environmental pollution in the city of madrid: an application with hedonic models and spatial quantile regression. Rev. d’Econo. Reg. Urbaine 1, 343–370 (2015)
Google Scholar
Diao, M., McMillen, D.P., Sing, T.F.: A quantile regression analysis of housing price distributions near MRT stations. Tech. rep., Annual Conference Real Estate and Urban Economics (2018)
Dreassi, E., Ranalli, M.G., Salvati, N.: Semiparametric M-quantile regression for count data. Stat. Methods Med. Res. 23(6), 591–610 (2014)
Article MathSciNet PubMed Google Scholar
Freeman, M.: The Measurement of Environmental and Resource Values: Theory and Method. Resources for the Future, Washington (1993)
Google Scholar
Fritsch, M., Haupt, H., Ng, P.T.: Urban house price surfaces near a world heritage site: modeling conditional price and spatial heterogeneity. Reg. Sci. Urban Econ. 60, 260–275 (2016)
Article Google Scholar
Garretsen, H., Marlet, G.: Amenities and the attraction of Dutch cities. Reg. Stud. 51(5), 724–736 (2017)
Article Google Scholar
Gravel, N., Michelangeli, A., Trannoy, A.: Measuring the social value of local public goods: an empirical analysis within Paris metropolitan area. Appl. Econ. 38(16), 1945–1961 (2006)
Article Google Scholar
Huang, P.: Impact of distance to school on housing price: evidence from a quantile regression. Empir. Econ. Lett. 17(2), 149–156 (2018)
Google Scholar
Huber, P.J.: Robust statistics. Springer, Berlin (2011)
Google Scholar
Huggins, R.: On the robust analysis of variance components models for pedigree data. Aust. J. Stat. 35(1), 43–57 (1993)
Article MathSciNet Google Scholar
Huggins, R., Loesch, D.: On the analysis of mixed longitudinal growth data. Biometrics 54(2), 583–595 (1998)
Article CAS PubMed Google Scholar
Koenker, R., Bassett, G., Jr.: Regression quantiles. Econometrica 46(1), 33–50 (1978)
Article MathSciNet Google Scholar
Kostov, P.: A spatial quantile regression hedonic model of agricultural land prices. Spat. Econ. Anal. 4(1), 53–72 (2009)
Article Google Scholar
Leung, T.C., Tsang, K.P.: Love thy neighbor: income distribution and housing preferences. J. Hous. Econ. 21(4), 322–335 (2012)
Article Google Scholar
Liao, W.C., Wang, X.: Hedonic house prices and spatial quantile regression. J. Hous. Econ. 21(1), 16–27 (2012)
Article Google Scholar
Mak, S., Choy, L., Ho, W.: Quantile regression estimates of Hong Kong real estate prices. Urban Stud. 47(11), 2461–2472 (2010)
Article Google Scholar
Malpezzi, S.: Hedonic pricing models: a selective and applied review. In: O’Sullivan, T., Kenneth, G. (eds.) Housing Economics and Public Policy, pp. 67–89. John Wiley & Sons, Hoboken (2002)
Chapter Google Scholar
McMillen, D.P.: Quantile Regression for Spatial Data. Springer Science & Business Media, Berlin (2012)
Google Scholar
McMillen, D.: Conditionally parametric quantile regression for spatial data: an analysis of land values in early nineteenth century Chicago. Reg. Sci. Urban Econ. 55, 28–38 (2015)
Article Google Scholar
Michelangeli, A., Zanardi, A.: Hedonic-based price indexes for the housing market in Italian cities: theory and estimation. Polit. Econ. 25(2), 109–146 (2009)
Google Scholar
Newey, W.K., Powell, J.L.: Asymmetric least squares estimation and testing. J. Econom. Soc. 55(4), 819–847 (1987)
MathSciNet Google Scholar
Opsomer, J., Claeskens, G., Ranalli, M., et al.: Nonparametric small area estimation using penalized spline regression. J. R. Stat. Soc. Ser. B 70(1), 265–283 (2008)
Article MathSciNet Google Scholar
Pratesi, M., Ranalli, M.G., Salvati, N.: Nonparametric M-quantile regression using penalised splines. J. Nonparametric Stat. 21(3), 287–304 (2009)
Article MathSciNet Google Scholar
R Core Team (2020) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/
Rosen, S.: Hedonic prices and implicit markets: product differentiation in pure competition. J. Polit. Econ. 82(1), 34–55 (1974)
Article Google Scholar
Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric Regression. Cambridge University Press, Cambridge (2003)
Book Google Scholar
Schirripa Spagnolo, F., Salvati, N., D’Agostino, A., et al.: The use of sampling weights in M-quantile random-effects regression: an application to programme for international student assessment mathematics scores. J. R. Stat. Soc. Ser. C (Appl. Stat.) 69(4), 991–1012 (2020)
Article MathSciNet Google Scholar
Tomal, M., Helbich, M.: A spatial autoregressive geographically weighted quantile regression to explore housing rent determinants in Amsterdam and Warsaw. Urban Anal. City Sci. Environ. Plan. B (2022)
Trzpiot, G.: Spatial quantile regression. Comp. Econ. Res. Central East. Eur. 15(4), 265–279 (2012)
Article Google Scholar
Tzavidis, N., Salvati, N., Schmid, T., et al.: Longitudinal analysis of the strengths and difficulties questionnaire scores of the Millennium Cohort Study children in England using M-quantile random-effects regression. J. R. Stat. Soc. Ser. A 179(2), 427–452 (2016)
Article MathSciNet Google Scholar
Uematsu, H., Khanal, A.R., Mishra, A.K.: The impact of natural amenity on farmland values: a quantile regression approach. Land Use Policy 33, 151–160 (2013)
Article Google Scholar
Waltl, S.R.: Variation across price segments and locations: a comprehensive quantile regression analysis of the Sydney housing market. Real Estate Econ. 47(3), 723–756 (2019)
Article Google Scholar
Wan, A.T., Xie, S., Zhou, Y.: A varying coefficient approach to estimating hedonic housing price functions and their quantiles. J. Appl. Stat. 44(11), 1979–1999 (2017)
Article MathSciNet Google Scholar
Wood, S.N.: Generalized additive models: an introduction with R. Chapman and Hall/CRC, Routledge (2017)
Book Google Scholar
Zietz, J., Zietz, E.N., Sirmans, G.S.: Determinants of house prices: a quantile regression approach. J. Real Estate Finance Econ. 37(4), 317–333 (2008)
Article Google Scholar

Download references

Funding

Open access funding provided by Università di Pisa within the CRUI-CARE Agreement.

Author information

Authors and Affiliations

Università di Pisa, Pisa, Italy
Francesco Schirripa Spagnolo & Nicola Salvati
Università Degli Studi di Milano-Bicocca, Milan, Italy
Riccardo Borgoni, Antonella Carcagnì & Alessandra Michelangeli

Authors

Francesco Schirripa Spagnolo
View author publications
You can also search for this author in PubMed Google Scholar
Riccardo Borgoni
View author publications
You can also search for this author in PubMed Google Scholar
Antonella Carcagnì
View author publications
You can also search for this author in PubMed Google Scholar
Alessandra Michelangeli
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Salvati
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francesco Schirripa Spagnolo.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Schirripa Spagnolo, F., Borgoni, R., Carcagnì, A. et al. A spatial semiparametric M-quantile regression for hedonic price modelling. AStA Adv Stat Anal 108, 159–183 (2024). https://doi.org/10.1007/s10182-023-00476-w

Download citation

Received: 11 July 2022
Accepted: 11 March 2023
Published: 30 March 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10182-023-00476-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A spatial semiparametric M-quantile regression for hedonic price modelling

Abstract

Similar content being viewed by others

Hedonic House Price Modeling Based on Multilevel Structured Additive Regression

A spatial hedonic model application of variance function regression to residential property prices in Beijing

Hedonic real estate price estimation with the spatiotemporal geostatistical model

1 Introduction