Ignoring Spatial and Spatiotemporal Dependence in the Disturbances Can Make Black Swans Appear Grey

Automated valuation models (AVMs) are widely used by financial institutions to estimate the property value for a residential mortgage. The distribution of pricing errors obtained from AVMs generally show fat tails (Pender 2016; Demiroglu and James Management Science, 64(4), 1747–1760 2018). The extreme events on the tails are usually known as “black swans” (Taleb 2010) in finance and their existence complicates financial risk management, assessment, and regulation. We show via theory, Monte Carlo experiments, and an empirical example that a direct relation exists between non-normality of the pricing errors and goodness-of-fit of the house pricing models. Specifically, we provide an empirical example using US housing prices where we demonstrate an almost perfect linear relation between the estimated degrees-of-freedom for a Student’s t distribution and the goodness-of-fit of sophisticated evaluation models with spatial and spatialtemporal dependence.


Introduction
Financial distress, such as experienced during the Great Recession and now as a result of the novel coronavirus, requires servicers and lenders to make decisions on new lending (refinancing or new purchases), forbearance, loan extensions, and foreclosure in the presence of a number of state laws and changing regulator guidelines. Most of these decisions require some estimate of collateral value. Financial institutions in the real estate market usually rely on experts, called appraisers, to estimate the value of the collateral. Because appraisers have substantial freedom in creating their estimate (Agarwal et al. 2015), the estimate is ultimately subjective. Automated valuation models (AVMs) provide an alternative approach (Pender 2016;Demiroglu and James 2018). An AVM 1 is a computer program that can use property characteristics, past house prices, and neighboring comparable sales in a statistical model to value property.
The Great Recession raised questions about the accuracy of appraisals and AVMs. Using a large sample of nonagency securitized loans originated between 2002 and 2007, Griffin and Maturana (2016) show that AVMs are more accurate than most appraisers as 44.9% of residential properties show appraisal values that are 5% higher than an AVM. Agarwal et al. (2015) examine the appraisal bias for residential refinance transactions. Particularly, they compute the valuation bias as the difference between the appraisal of the refinance transaction and the consecutive purchase price. On a US national sample of conforming loans, the authors find that the appraisal bias for residential refinance transactions is more than 5%. Ding and Nakamura (2015) compare the appraised value of the property with the agreed contract price between the buyer and the seller to analyse the impact of the Home Valuation Code of Conduct (HVCC) in 2009 on the appraisal practice. They find that appraisal valuations are on average higher than the contract price both before and after the financial crisis, even if the HVCC has led to a decrease of the overestimate of the property value. For the accuracy of AVMs, Demiroglu and James (2018) provide evidence that AVMs show high pricing errors of between 12% and 15% of the actual sale price for a median quality home. The variability is even higher for properties below median quality. The authors also show that high pricing errors can be the cause of an apparent appraisal bias even when the AVMs are unbiased. Krugery and Maturanaz (2020) analyze appraisals and AVMs on a large sample of US non-agency securitized loans originated between 2001 and 2007. They highlighted that the potential sources of the errors for AVMs and appraisals are different. AVM errors are statistical errors or due to model miscalibration, whereas moral hazard could potentially generate the appraisal errors. The authors obtain empirical evidence of high appraisals relative to AVM valuations over time and across different types of loans. Specifically, appraisals are almost 5% higher (on average) than AVM valuations and appraisals exceed AVM valuations 60% of the time.
The main contribution of this paper is to explain why AVMs may yield accurate predictions with highly non-normal errors. Specifically, this manuscript examines the relation between increasing the goodness-of-fit of housing models and the need to model the house price distribution of the disturbances. AVMs mainly rely on house price models that often show highly non-normal errors (Young and Graff 1996;Gu 2002;Pontines 2010;Schindler 2013) and high levels of spatial dependence (Bourassa et al. 2007;Liu 2013;Pace 1997;Osland 2010). Using both Monte Carlo simulations and empirical data, we show that addressing spatial or spatiotemporal dependence could lead to an increased precision of the house price prediction which, in turn, could unmask non-normal disturbances. Consequently, more sophisticated approaches, such as spatial regression models (LeSage and Pace 2009), can lead to non-normal disturbances with a lower magnitude.
Leading AVM companies are Zillow, Collateral Analytics and CoreLogic, among others. For example, Griffin and Maturana (2016) analyse data provided by Collateral Analytics. Although all of the providers of AVMs give some statistics on their accuracy, we will use Zillow as a motivational example as Zillow provides the most visible automated valuation model available to the general public with over 100 million price estimates (known as "Z-estimates") provided on their website at no pecuniary charge.
We assume that Zillow has adequately modeled house prices using accurate explanatory variables as well as handling various forms of temporal, spatial, and spatiotemporal dependence known to affect house prices (Aquaro et al. 2021;Can and Megbolugbe 1997;Pace 1997). 2 Therefore, the Z-estimate residuals should convey some information about the underlying distribution of the disturbances. Zillow reports nationally that their models have a median absolute percentage error of 4.5%. Moreover, 89.7% of the Z-estimates exhibit absolute percentage errors of 20% or less.
Under the assumption that the percentage errors are symmetric, a Student t distribution would provide a common way of capturing the kurtosis that empirical residuals often display (Pontines 2010). Fitting a Student t distribution to the Zillow reported statistics results in an estimate of 1.37 degrees-of-freedom, which means that the variance and kurtosis are infinite. Compare this to a Cauchy distribution (Student t with 1 degree-of-freedom) which for a median absolute percentage error of 4.5% would have 86% of the absolute residuals within 20% (as opposed to the 89.7% in Zillow). If the true innovations in house price models follow a Cauchy-like distribution, this means that large errors are not particularly rare and this has implications for financial risk management, assessment, and regulation. In the context of finance, such rare events have been labeled "black swans" (Taleb 2010).
In this paper we show both theoretically and through Monte Carlo simulations that disturbances in house price models which ignore spatial-temporal dependence can appear more normally distributed. Using US census data from 2000 we obtain a strong linear relationship between the accuracy of a house price model and the tail heaviness (indicated by the level of leptokurtosis) of the residuals. We can sum up that assuming non-normal distribution for the disturbances could provide at least one way to address the issue of "black swans." In contrast, simple models that ignore spatial or spatiotemporal dependence may lead to "normal" residuals that mask nonnormality. In other words, not modeling spatial and spatiotemporal dependence can make "black swans" appear grey.
We begin with "The Zillow Example" where we analyze the kurtosis of the residuals in the Z-estimates. "The Kurtosis in Regression Models with Spatial and Temporal Dependence" documents the kurtosis present in the residuals from house price models and how this makes the risk assessment process more difficult. We motivate why residuals from non-spatial models may mask non-normality and provide both theoretical and Monte Carlo evidence to support this statement. In "An Empirical Example" we set forth an example based on housing data, where we show a very strong empirical relation between the house price accuracy and the leptokurtosis of the residuals. We finish in "Conclusion" with a summary of the key findings and discuss the implications of this research for housing models and other areas of application.

The Zillow Example
In this section we analyse the kurtosis of the residuals in the Zillow house price model. We use the data published on Zillow's website on September 30, 2016. We assume that the error random variable u is a Student t with ν degrees-of-freedom so that fat tails can be obtained (Pontines 2010). Practically, we generate a random variable u with a given ν and examined different levels of scale s to match the median absolute error med|e| and the proportion of errors under 20% #(|e| < 0.2) observed in the Zillow data. We show the results in Table 2. The estimated degrees-of-freedom ν vary from a low of 1.06 (Philadelphia) to a high of 2.2 (Denver). For the Student t, the first moment (mean) exists when the degrees-of-freedom exceed 1 and the variance exists when the degrees-of-freedom exceed 2. All the cities have a mean that exists, but only three cities have a finite variance (Denver, Portland, and Seattle). None of the cities have a defined skewness or kurtosis. Further extending the results, we calculate that Pittsburgh has 5.2% of the predictions that exceed 50% error while Seattle has only 1.1% of the predictions that exceed 50% error.
Various studies have examined non-Gaussian error terms in house price models. Chasco et al. (2018) analyze the impact of different non-normal error distributions on a test for spatial groupwise heteroscedasticity through simulations. They also show a clear non-normality in the error terms using the Jarque-Bera statistic on houses prices in Madrid. As the distribution of house selling prices display heavy tails, De Oliveira and Ecker (2019) propose a non-Gaussian hedonic spatial model for the log selling prices of 1,502 houses in Cedar falls (Iowa). Aquaro et al. (2021) perform Monte Carlo simulations to analyse the impact of non-Gaussian errors in a spatial house price model with heterogeneous coefficients.
The lack of normality in the error terms has major implications for prediction intervals. For prediction, a key distinction exists between the confidence interval of the prediction and a prediction interval. A confidence interval for the prediction expresses the uncertainty associated with the prediction, mainly due to a low sample size. As the sample size increases n → ∞, the prediction becomes less and less variable and, at some point, goes to a constant. If the model is correct, with enough data, the model will yieldŷ = E(y) with no uncertainty. However, a prediction interval takes into account the uncertainly associated with the prediction and, even more importantly, the uncertainty associated with the disturbance. In the case of enough data, the distribution of y i is just E(y i ) plus the distribution of the disturbance which could be highly non-normal. In this case, increasing the sample size reduces the uncertainty of the forecast but does not have any effect on the uncertainty associated with the disturbance. Typical prediction intervals often use the standard deviation σ of the random variable y i and follow the form y i =ŷ i ± g ·σ where g captures the desired level of accuracy. However, the Student t distributed random variables do not have a defined variance for less than two degrees-of-freedom, and so this would not work for the Zillow data. Of course, one can always define a prediction interval using the quantiles of a non-normal distribution, and it will assign substantially more probability to extreme values than with the normal distribution Table 1. The time span for the observations that Zillow reported on September 30, 2016 was not clear. To see whether this was sensitive to the unknown time span we estimated the statistics again using a new sample from data publicly provided by Zillow and last updated on June 26, 2019. The results here differ somewhat from those reported in Table 2 as we only focus on off-market properties that are more difficult to evaluate as documented by the error statistics. However, the results in Table 2 still show substantial kurtosis with the maximum degrees-of-freedom for Seattle at 2.22 (versus 2.19 for Denver in the 2016 data) and the minimum degrees-of-freedom of 1.37 for Baltimore (versus Philadelphia at 1.06 in the 2016 data).

The Kurtosis in Regression Models with Spatial and Temporal Dependence
In this section we examine the kurtosis in regression models with spatial and temporal dependence. "The Kurtosis of Linear Combinations" provides a theorem to compute the kurtosis for linear combinations that will be used for spatial and temporal autoregressive and moving average processes and examines the attenuation of kurtosis that arises from models that ignore temporal, spatial, and spatiotemporal dependence.
"Monte Carlo study" shows these points with specific distributions and dependence data generating processes. "The relationship between goodness-of-fit and kurtosis" describes the relationship between the goodness-of-fit of a regression model and the kurtosis of the disturbances. We also show how introducing many explanatory variables in a regression model may have the effect of raising the level of non-normality in the residuals.

The Kurtosis of Linear Combinations
The following theorem is helpful to analyse the kurtosis of a spatial and temporal autoregressive and moving average processes.

Theorem 3.1 Let r be a vector of k unit symmetric and mutually independent random variables (rvs) with level of kurtosis κ. We consider a linear combination of
where the vector p = p 1 p 2 . . . p k represents the weights of the linear combination Eq. 3.1. The excess kurtosis of v is contains the qth powers of the weights p j with j = 1, 2, ..., k Proof The moments of the rvs r j with j = 1, 2, ..., k are Given that r contains mutually independent unit rvs r j , the variance of v equals the sum of squared weights where ι k is a vector composed of ones and p (2) contains the squares of the weights p j with j = 1, 2, ..., k.
The kurtosis of v defined in Eq. 3.1 is We can compute the fourth power of v in Eq. 3.7 using the multinomial theorem as stated in Eq. 3.8.
Because the odd moments equal 0, as Eq. 3.3 shows, E(v 4 ) involves only even powers. For the fourth power, this does not involve cross-products and therefore the contribution to E(v 4 ) from the fourth powers equals κ p (4) . The only other nonzero terms are associated with the products of p Thus, there will be 4!/(2!2!) = 6 quadratic terms. In summary, there will be 0 first order terms, 6 quadratic terms, 0 cubic terms, and 1 quartic term in the overall polynomial in Eq. 3.8.
In actually computing the polynomial, we can easily do this via some linear algebra. We begin by defining the matrix P in Eq. 3.9 as the outer product of the squared weights p 2 j , We also consider the k by k matrix D whose elements on the main diagonal are given by the fourth powers of the weights p j with j = 1, 2, · · · , k and all other elements are zeros D = diag(p (4) ).
The upper and lower triangles of the matrix P contains all the cross-products of p (2) i p (2) j for i, j = 1, 2 . . . k. The matrix (P − D) eliminates the fourth order products on the diagonal, and therefore ι k (P − D)ι k contains 2 times p (2) i p (2) j for i, j = 1, 2, . . . k. From the multinomial coefficient in Eq. 3.8, the multiplicity of the products of the quadratic terms p (2) i p (2) j in a fourth order expansion equal 4!/(2!2!) = 6 . Hence, the quantity 3ι k (P − D)ι k contains all the second order cross-products p (2) i p (2) j with the desired multiplicity. Therefore, this allows to simply the expression of E(v 4 ) as follows Given the definition of kurtosis in Eq. 3.7 and the expression of E(v 4 ) from Eq. 3.10 as well as σ 2 v from Eq. 3.6, this leads to the kurtosis of v in Eq. 3.11.
Some manipulations of Eq. 3.11 leads to a simpler expression in Eq. 3.12.
where the excess kurtosis of v is a function of the weighted excess kurtosis of the component random variables r.
We can obtain a further simplification by examining a combination of random variables v eq where each random variable r j with j = 1, 2, . . . , k shows the same level of kurtosis κ j = κ and each random variable has equal weights p j = 1/k with j = 1, 2, . . . , k. In other words, let v eq = k j =1 r j /k. In this case, the kurtosis approaches 3 as the number of random variables k becomes large as shown in Eq. 3.13 We now turn to examining cases where each random variable has the same level of kurtosis κ, but the linear combination has different weights so that v = κ p = κι k p. Again, in the spirit of seeking simple expressions for kurtosis, we examine the ratio φ between the kurtosis of the weighted random variable v = r p relative to the excess kurtosis κ − 3 for each random variable. From Eq. 3.2 we obtain Eqs. 3.14 and 3.15 (3.15) To make this more concrete, we illustrate the relative levels of kurtosis φ for two temporal processes, a Moving Average MA(1) and an Autoregressive AR(1). For a temporal MA(1) process, the weight vector is p = 1 τ , the variance of v is σ 2 v = 1 + τ 2 and the sum of the fourth moments is ι k p (4) = 1 + τ 4 . We assume τ ∈ (−1, 1). We obtain an expression for the relative excess kurtosis in Eq. 3.16 (3.16) For any level of τ , φ MA(1),t ≤ 1 and, therefore, the level of kurtosis in the random variable v decreases from independent r to a temporal moving average dependence process. For example, τ = 0.5 leads to a φ MA(1),t = 0.68.
The relative excess kurtosis in Eq. 3.15 for a AR(p) process is Analogously, we consider spatial MA(1) and AR(1) processes. To define them, we assume an exogenous square spatial weight matrix W and the associated scalar parameter ρ. To keep this simple, we assume I n − ρW is invertible for all ρ in the open interval (−1, 1). This condition is sufficient for row-stochastic W (which has a principal eigenvalue of 1) and symmetric W scaled to have a principal eigenvalue of 1. In the spatial weight matrix W the generic element w ij is equal to a positive number when observation j is a neighbor to observation i and 0 otherwise. To prevent each observation from predicting itself, w ii = 0 for all i. 3 For simplicity, we assume (3.20) For first-order spatial autoregressive model, the residual is given by u = (I n − ρW ) −1 ε where ε represents a vector of iid rvs with kurtosis κ. To obtain a single row of (I n − ρW ) −1 , we need to solve the following equation (I n − ρW )x = 0 /i for x where 0 /i is a vector of zeros but with a one on the ith row. 4 After computing x, its transpose x represents the ith row of (I n − ρW ) −1 . Given this row, we can easily compute the variance of v, the sum of the fourth moments ι k p (4) and φ AR(1),s .
Analogously to the temporal processes, we can extend the results in Eq. 3.20 and for the AR(1) process to MA(q) and AR(q) spatial processes.

Monte Carlo study
We perform a Monte Carlo study to better understand the reduction of excess kurtosis for a spatiotemporal process. We use a contiguity W matrix with 100 observations and generate a product separable spatialtemporal process where L represents a n × n matrix that lags a n × 1 vector v t so that Lv t = v t−1 , u = ((I n − τ L)(I n − ρW )) −1 ε where ε is a vector of iid random variables. Because the time series expansion uses 75 terms and the spatial part uses 100 terms, each trial involves 7,500 ε vectors. We repeat this for 1,000 iterations and compute the empirical excess kurtosisκ v − 3 andφ defined in Eq. 3.14. We also derive a correctionφ cor obtained by dividingφ by the value of the same parameterφ in the case of temporal and spatial independence (i.e. ρ, τ = 0). We report the t-stats for the difference betweenφ cor and φ in the following tables. Table 3 displays the results for Laplace distributed disturbances with κ = 6, Table 4 contains the results for logistic disturbances with κ = 4.2, and Table 5 shows the results for Pearson disturbances with κ = 4. By and large, the empirical measurements ofφ agree with the theoretical calculations. The Laplace distribution shows close agreement between the measured and theoretical values of φ, with none of the t statistics showing significant differences.
We now turn to examining other spatial structures, specifically those with 15 and 30 nearest neighbors. Both of these weight matrices W were symmetricized and then made doubly stochastic so that the row and column sums equal 1. This is common procedure in spatial econometrics (LeSage and Pace 2009, p. 88). Table 6 contains the  If we consider τ = ρ = 0.8, Table 3 shows a lower value of φ (0.1064) for 15 nearest neighbors compared to the value of φ (0.1281) for 30 nearest neighbors reported in Table 7. We interpret these results considering that the larger number of neighbors uses more random variables, but this also reduces the variance of the average which weakens the reduction of kurtosis.

The relationship between goodness-of-fit and kurtosis
The purpose of this section is to show how goodness-of-fit can amplify (or attenuate) the kurtosis in the residuals of spatiotemporal processes. We consider y = r p a linear combination of r = ŷê with weights p = σŷ σê . We begin with a simple setting where E(y) = 0 with σ 2 y = 1, so we obtain σ 2 y = R 2 , σ 2 e = 1−R 2 and If we solve for the kurtosis of the residuals κê in Eq. 3.21, we obtain Eqs. 3.22 and 3.23 We can give further structure to Eq. 3.23 by modeling the κŷ. Intuitively, elaborate regression models may contain a large number of explanatory variables and the excess kurtosis of a linear combination of a large number of explanatory variables could go to 0. This is also the situation that often produces a high coefficient of determination R 2 . Ifŷ is normal distributed, therefore its kurtosis is 3 and the Eq. 3.23 reduces to Eqs. 3.24 and 3.25 (3.25) Equation 3.25 shows that any excess kurtosis in y is amplified by α in terms of excess kurtosis in the residuals. For example, if R 2 = 0.9, the residuals will excess kurtosis augmented by 100 times higher than the excess kurtosis of y. This declines rapidly with fit, so that a coefficient of determination R 2 = 0.5 results in a factor of four amplification of the excess kurtosis ofê relative to the excess kurtosis of y.
We can go further in modeling the κŷ introducing assumptions on the kurtosis of individual explanatory variables and on the regression parameters. To illustrate this, we assume thatŷ uses a large number of independent explanatory variables x a where E(x a ) = E(y) = 0, x a ⊥ x b for a, b = 1, . . . k. We can think of this as coming out of a principal components analysis where each regressor is orthogonal to the others. Furthermore, we can assume that the magnitude of the estimated regression coefficients follows a geometric decline where the most important component has a parameter of 1, the second most important component has a contribution of g, and so forth. These assumptions are represented by Eq. 3.26 where g is a scalar constant giving the rate of the geometric decline in the contribution of x i . If the number of explanatory variables x i is high enough, the sum in Eq. 3.26 converges. We also assume a constant level of kurtosis and variance for each x i as well as that all the odd moments equal 0.
To compute the excess kurtosis ofŷ given in Eq. 3.27, we can employ the same development of the convergence of an autoregressive sequence AR(1) shown in Eq. 3.20. Even if in this case there is no temporal dependence, we can use the same geometrically declining weights that occurs in an AR(1) process to obtain the following result The slower the weights decline, the less excess kurtosis will be shown byŷ. Equation 3.27 also shows that if the weight g = 0, the excess kurtosis of the prediction κŷ − 3 equal the excess kurtosis of the explanatory variable κ x − 3. On the contrary, for a weight g close to 1, the excess kurtosis of the prediction will be close to 0, just as with a normal random variable. From the excess kurtosis of the residuals in Eq. 3.22 and the excess kurtosis of the prediction κŷ, we derive the Eq. 3.28 As a specific example, we assume g = 0.9, R 2 = 0.9, κ x = 6, and κ y = 4. These parameters would lead to κŷ = 3.315 and κê = 82.3. Changing g = 0.8 and R 2 to 0.8 would yield κê = 41.5. The Eq. 3.28 provides an explanation for why high fit regressions can have leptokurtic residuals. First, high fit (high R 2 ) directly amplifies any excess kurtosis in y by large factors as R 2 approaches 1. Second, if the high fit comes from including many explanatory variables, the excess kurtosis of the explanatory variables, which can reduce the excess kurtosis of the residuals, may tend to go away. In that sense, adding an additional explanatory variable increases R 2 as well as decreases κŷ and, thus, increases the kurtosis of the residuals.
To illustrate this further, we conduct a small Monte Carlo experiment where g = 0.75, 0.85, 0.95, R 2 = 0.5, 0.75, 0.95 and each entry represents the average of 25,000 trials with n = 100, 000. We consider 100 explanatory variable x i with different weights given by the powers of g. The results appear in Table 8. The third column κ e in Table 8 represents the actual average kurtosis of the error term, generated using Laplace random variables with a theoretical kurtosis of 6. The fourth column represents its theoretical prediction according to the values ofŷ and y, the fifth column contains the empirical value for the kurtosis of the predictions while the sixth column contains the predicted kurtosis of the predictions. Finally, the seventh column reports the measured kurtosis of y while the last column contains the predicted kurtosis of y. The empirical and theoretical values are in close agreement with each other and there are no statistically significant violations based on t tests on the differences. From Table 8 we note that a small amount of excess kurtosis in y with an almost identical amount of excess kurtosis in the predictions can lead to a kurtosis of 6 in the disturbances.

An Empirical Example
In this section we provide an empirical example of the positive relation between better fitting models and leptokurtic residuals. For example, the residuals of a model with only the intercept include the true disturbances as well as the omitted variables. Both the true disturbances and omitted variables may exhibit spatial, temporal, and spatiotemporal dependence across observations. The combination of these various disturbances can produce residuals that follow a more normal distribution than the underlying innovations and thus mask the underlying innovations. Consequently, modeling these aspects of the data can uncover or unmask the distribution, normal or non-normal, of the underlying innovations.
To show this, we use 62,266 census-tract level observations from the 2000 Census. The same sample has been used by LeSage and Pace (2009, p. 272). We consider a regression model where the dependent variable y 2000 is the logarithm of the median  Table 9. Given the dependent and explanatory variables, we examined different specifications to obtain some variation in goodness-of-fit as measured byσ or median absolute residuals to examine whether this led to a corresponding variation in the distribution of the residuals as measured by the skewness, kurtosis, and the degrees-of-freedom from a fitted t distribution. Panel A in Table 10 shows the 12 different estimated specifications which involve various combinations of X, WX, temporal lags, spatial We use an intercept C in all the specifications. The first column in Panel A represents the number of cases and the second one the maximum number of parameters p in the regression model. When both WX and a time lag are included in the model, this means that Wy 1990 is also added. This represents the last row in Panel A where 12 cases are created with a number of estimated parameters p varying between one and 26. Because the log-price in 2000 P 2K is regressed on variables from 1990, it indirectly includes various macro factors such as inflation. However, any adjustment for inflation made over the entire sample would proportionately affect all the estimated β coefficients and variance. It would not affect the kurtosis of the residuals, the focus of this exercise, since multiplicative transformations do not affect the kurtosis of a random variable.
Panel B in Table 10 shows the standard deviation of the residuals, estimated kurtosis, skewness, median absolute errors and the estimated degrees-of-freedomν from a Student t distribution fitted to the residuals. The intercept only model (Case=1) shows mild levels of kurtosis (3.55) and skewness with a high amount of error in the residuals (median absolute error of 39.2%). The estimated degrees-of-freedom ν is 12.37. The addition of spatial, temporal and explanatory variables (Case=12) reduce the median absolute error to 8.7% but raise the estimated kurtosis to 25.23, and decrease the estimated degrees-of-freedomν to 3.05. The correlation between the median absolute error and the estimated degrees of freedomν is 0.976.
We plot in Figure 1 the relation between the estimated median absolute residuals |e| and the estimated degrees-of-freedomν.
In addition to the results reported in Table 10, we consider the following additional specifications that might affect the fit: • squares of the explanatory variables • both W X and W 2 X • contiguity as well as 15 and 30 nearest neighbor W • matrix exponential spatial specification for autoregressive y as well as spatial fractional differencing for autoregressive y (LeSage and Pace 2007; 2009; Debarsy et al. 2015). The matrix exponential spatial specification falls between MA and AR specifications for lower-order spatial lags while the fractional differencing assigns more importance to higher-order spatial lags than AR. Higher order ARMA models can approximate, albeit with more parameters, fractional differencing models (Haubrich 1993, p. 767).
In total, this led to 88 different specifications attempting to capture temporal dependence, non-linearities in the response, and many forms of spatial dependence in the dependent and independent variables. For all these specifications, we still obtain that the residuals demonstrate a striking relation between goodness-of-fit and measured

Conclusion
In a spatial regression model, non-normality of the underlying independent disturbances has little effect on the accuracy of the parameter estimates and on the point prediction of the dependent variable for large samples (Pace and LeSage 2008). However, it does matter greatly for prediction intervals as they depend on the distribution of the underlying disturbances as well as the distribution of the prediction which will exhibit a low variance for large n. The coverage of the prediction interval is important for many applications such as a house price stress test (Follain and Giertz 2011). An important goal for many of these applications is to accurately model extreme (rare) values, also known as black swans. Dependence in a spatial error model means that observed residuals come from weighted sums of underlying independent innovations. If the underlying innovations are normally distributed, the observed residuals should be normal as well. However, if the underlying innovations are non-normal, the resulting weighted sums can appear more normal than the underlying innovations. In such cases, in this paper we show theoretically, on Monte Carlo simulations and empirical data that ignoring spatial or spatiotemporal dependence in the disturbances can make the distribution of residuals appear more normal than the underlying innovations. As this can lead to inaccurate estimates for the extreme (rare) values, the fog of spatial and temporal dependence can make black swans appear grey.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.