1 Introduction

Urban heat island (UHI; see Appendix 1 for abbreviations) is probably the most significant phenomenon of urban climate, with further strong impact on various aspects of urban environment. Depending on climate regime and season, socio-economic and health impacts of UHI are either positive or negative. For example, the UHI influence on human comfort, mortality, and energy usage in cold and moderate climate is positive in winter and negative in summer, while in hot climate, it is negative regardless the season. Besides that, UHI influences air pollution dispersion in the cities, water usage, bioclimatic conditions, and others (Unger 2004). The most important features describing UHI are its magnitude and spatio-temporal structure. All the information is expected by town-planners, municipal services, and are essential input for various modeling studies (e.g., air pollutants dispersion).

Providing spatially continuous information on weather and climate at any time is a crucial but difficult task because meteorological observations and measurements are usually discretely distributed. Such a procedure requires data transformation (from discrete to continuous in space) that can be performed by various spatial interpolation (or spatialization) methods. Spatialization algorithms can be divided into several groups: deterministic and stochastic (or their combinations), exact and inexact, global and local, and one- and multidimensional. Numerous algorithms have been successfully introduced into meteorology and climatology (Dobesch et al. 2007; Tveito et al. 2008).

The general spatial structure of UHI is characterized by the occurrence of three distinct zones, named cliff, plateau, and peak (Oke 1976).This general structure of UHI can be strongly modified depending on land-use types and urban structures. During calm and clear-sky meteorological conditions, which are favorable for UHI development, and especially during nighttime, UHI takes a multicellular, irregular shape (e.g. Park 1986; Kłysik and Fortuniak 1999) and is sometimes called an urban heat archipelago (Unger 2004). The first attempts to analyze the UHI structure were based on manually interpolated isotherm maps (Duckworth and Sandberg 1954). More sophisticated interpolation algorithms became popular with the increasing access to effective computers and development of geographic information system (GIS) (Svensson et al. 2002; Bottyán and Unger 2003; Vicente-Serrano et al. 2005; Alcoforado and Andrade 2006). Most of the recent studies on spatial characteristic of UHI are based on multidimensional interpolation algorithms, with the multiple linear regression (MLR) being the most often applied (Unger et al. 2010). This is because of the strong correlation of UHI with urban environment characteristics, which can be described and analyzed quantitatively in space with GIS tools. Good performance of multidimensional interpolation techniques (both MLR and its extension—residual kriging, RK), especially in the case where observations are sparse, unevenly distributed, and do not cover the entire city area, was confirmed by earlier studies (Szymanowski and Kryza 2009). Despite providing better results of interpolation of UHI than univariate geostatistical techniques, MLR could lead to distorted results when the spatial process is non-stationary, e.g., due to wind influence. Spatial non-stationarity is common for meteorological data; therefore, applicability of the given interpolation algorithm can be strongly limited if the method is not able to deal with it. This is the main problem when applying multidimensional algorithms like MLR.

The main goal of this paper is the application and evaluation of GWR for determination of the spatial structure of seven selected UHI cases measured in Wroclaw (SW Poland). In the following sections, study area and measurement data are briefly described. The next sections introduce the new set of potential spatial predictors which were used for interpolation with both MLR- and GWR-based algorithms. The set of spatial predictors was significantly extended in comparison with previous study of Szymanowski and Kryza (2009), with the aim to verify if there is a significant gain in terms of interpolation results when utilizing more complex approaches for predictor calculation. Next, the global and local regression models are introduced. An in-depth statistical analysis is performed to verify if there is methodological (statistical) justification for a more complex approach with local models. Finally, local and global regression models, both raw and extended by interpolation of the regression residuals (RK and GWR residual kriging (GWRK) for geographically weighted regression with residual kriging) are used for spatial interpolation of UHI. Interpolation results are evaluated with cross-validation (CV) approach to quantify if there is a gain in terms of smaller interpolation error when approaching the spatial structure of urban heat island with GWR and GWRK algorithms vs. global models.

2 Study area

Wrocław is a mid-sized city (293 km2; ∼640,000 inhabitants) located in SW Poland (51°N, 17°E). The average elevation of the city is ∼120 m a.s.l., and the terrain is relatively flat; therefore, the local climate is practically not affected by changes in elevation. The city is located along the Odra River. Approximately 31.4% of Wrocław is a built-up area, consisting of city and mixed series of the “local climate zone” classification system by Stewart and Oke (2009). The remaining areas of the city are mostly agricultural areas (cropped and bared fields; 28.9%), urban greenspace with semi-natural forests and grasslands (36.6%), and water—3.1% (Fig. 1).

Fig. 1
figure 1

Land-use map of Wrocław and air temperature measurements sites. U urban station, R rural station

Wroclaw is located in the temperate, transitional (maritime–continental) climate, with the mean annual temperature of 8.8°C. Mean values of basic climate elements are presented in Table 1.

Table 1 Average characteristics of Wroclaw climate for the period 1971–2000 and UHI magnitude: April 1997–March 2000 (Szymanowski 2004)

The magnitude of the UHI was calculated as the air temperature difference dT = T U − T R measured at the same time on stations U and R (Fig. 1, Table 1). Also, the occurrence of UHI in the city center was calculated as the frequency of dT stated above. Detailed average, extreme, and frequency of UHI values in Wrocław in the period April 1997–March 2000 were introduced by Szymanowski (2004, 2005) and Szymanowski and Kryza (2009). UHI phenomena in the city center rises the annual mean temperature by 1.0 K. Thermal excess is weaker in large housing estates (0.7 K) and in residual areas (0.3 K). Similarly to other cities of this size, the average magnitude of UHI in the night is two to three times higher than the average value for daytime. The maximum difference between the city center and suburban areas may exceed 9 K (Szymanowski and Kryza 2009). Positive values of UHI in the central parts of the city are observed during >96% of night hours and >80% of daytime, but strong UHI effect (>5.0 K) are measured in 3.8% of night hours and only randomly during daytime. The annual cycle of the UHI magnitude is dependent on meteorological conditions and the release of artificial heat. The most favorable conditions for UHI occur in warm season, but due to increasing convective cloudiness in the mid-summer, the highest values are observed in May and August. Secondary maximum of UHI intensity is observed in January (heating season), and the minima are observed in October and February. More detailed analysis of UHI in Wroclaw is provided by Szymanowski (2004, 2005) and Szymanowski and Kryza (2009).

3 Meteorological data

As one of the objectives of the paper is the determination of the best interpolator of UHI, exactly the same air temperature measurements, gathered with automatic mobile meteorological stations, as in the former study were used (Szymanowski and Kryza 2009), and the reader is referred there for details on measurement and processing methodology. Seven UHI cases were observed in years 2001–2002 during nighttime with relatively weak winds (<4 m s−1) and cloudless or moderately cloudy skies (Table 2). All UHI cases analyzed can be classified as radiative in origin. The frequency of the night hours with similar UHI is 31.3%, based on measurements gathered in period April 1997–March 2000 in Wrocław. The former studies on the UHI in Wroclaw revealed that the increase of wind speed to over 4 m s−1 at night, irrespective of cloudiness, causes a considerable reduction of the UHI magnitude (Szymanowski 2005). The measurements were performed during the UHI stabilization phase (approximately equal cooling rates at urban and rural stations) to avoid fast changes in UHI magnitude (Haeger-Eugensson and Holmer 1999; Runnalls and Oke 2000). Finally, measurements from 206 points were selected along routes systematically to represent different land-use categories with some densification over the most interesting and geometrically diverse areas in the city center (Fig. 1).

Table 2 Meteorological conditions in the city outskirts and UHI magnitude in the city center during measurements

4 Methods

The overall methodology of the study encompassed six stages, which are described accordingly:

  1. 1.

    Preparation of a spatially continuous set of potential UHI predictors required for multidimensional interpolation algorithms

  2. 2.

    Specification and evaluation of the MLR models

  3. 3.

    Specification and evaluation of the GWR models and selection of the kernel type and size, testing for spatial non-stationarity of parameter estimates

  4. 4.

    Comparison of regression models using ANOVA

  5. 5.

    Extension of the regression models by interpolation of residuals

  6. 6.

    Evaluation of the spatial interpolation results calculated with four models: MLR, GWR, RK, and GWRK.

The set of potential UHI predictors was prepared with GIS tools, provided with GIS GRASS (GRASS Development Team 2010) and ArcGIS systems. The statistical analysis (points 2–6) was performed with R statistical package (R Development Core Team 2010) and GWR3 software for Geographically Weighted Regression (Charlton et al. 2010).

4.1 Potential UHI predictors

High-rise development, introduction of new surface materials (mostly water-proof, opaque, and air-tight), emission of artificial heat, moisture, and pollutants are among the leading factors responsible for aerodynamic, radiative, thermal, and moisture modifications of the local climate in cities and responsible for UHI phenomena (Oke 1987). Most of the features describing size, geometry, thermal properties, and “metabolism” of the cities may be derived from maps, digital databases, and satellite imagery by GIS techniques. All the spatially continuous information can be used as additional explanatory variables in the UHI spatialization process with multidimensional methods (Bottyán and Unger 2003; Alcoforado and Andrade 2006; Szymanowski and Kryza 2009).

In the previous study, the authors of this paper compared various interpolation algorithms for the UHI spatial interpolation and used a set of six spatial predictors for multidimensional spatialization, which were derived mostly from the land-use map of Wrocław and the buildings database available only for the selected areas of the city (Szymanowski and Kryza 2009). The regression analysis showed that for some UHI cases, over 30% of temperature variance remained unexplained. Therefore, the question appeared if calculation (for example of roughness length) of relatively simple land-use map derived predictors is detailed enough for spatial interpolation procedure, or the interpolation results can be improved by providing other spatial predictors or derived with more complex approaches. Here, the state of the art LIDAR-originated database, together with 3D trees database and digital elevation model (DEM), were used to expand a set of potential predictors and develop the new ones. These were supported by the extensive set of Landsat ETM+-derived information. All potential predictors are described in the following sections, including short introduction of the previously applied independent variables (Section 4.1.1) and newly derived (Sections 4.1.24.1.5).

4.1.1 Land-use map derivatives

This set of data was prepared with the land-use map supported by topographic maps (1:10,000), orthophotomap, and 3D building database for selected areas of the city. Analysis was performed for ca. 9 ha testing fields for each land-use category, and the achieved values were assumed to be typical of a given class. All variables of this group were used in the previous study by Szymanowski and Kryza (2009):

  1. 1.

    Roughness length (z 0; meters), which is one of the most important parameters describing properties of the urban boundary layer, was calculated using the modified formula proposed by Lettau (1969). Due to limited information on buildings geometry, some simplifications were assumed: the lot area was held equal to the area of the given land-use class and wind direction was not incorporated in the silhouette parameter. The high values of the predictor are related with the areas of decreased wind speed and turbulent fluxes. The >0 regression coefficient is expected for this predictor.

  2. 2.

    Percentage of artificial surfaces (AS, percent) in a given land-use class took into consideration both horizontal (e.g., roofs, roads) and vertical surfaces (walls). Artificial surfaces were added together and linked to the lot area. Buildings were represented by boxes and roof structures were not considered during calculations. The predictor describes jointly the areas of altered energy balance leading to positive thermal anomaly, as described by Oke (1987). The >0 regression coefficient is expected for this predictor.

  3. 3.

    Percentage of semi-natural surfaces (NS, percent) in a given land-use class. Calculations for this parameter were similar to AS, but only horizontal surfaces were considered. The expected regression coefficient is <0.

  4. 4.

    Thermal admittance (μ, Joules per squared meter per root second per Kelvin), estimated as a weighted value of the ratio of vegetated surfaces to artificial surfaces. Thermal admittance for concrete (built-up classes) and moderately moist (40%) clay soil covered by grass (non-built-up classes excluding water) were used as starting values, after Boeker and van Grondelle (1995). The predictor describes the areas of increased sensible heat storage and is expected to be positively correlated with air temperature.

4.1.2 Landsat ETM+ derivatives

In this study, three Landsat ETM+ scenes were used. Because Landsat data acquisition is repeated every 16 days over each place, it was impossible to find imagery exactly for the same day as the temperature measurements were gathered. The selected scenes are considered as complementary for more than one session of mobile measurements (Table 3). Landsat data are taken over Wrocław at ∼9:30–9:40 UTC. Daytime images were used in this work because of availability and applicability for calculation of several predictors, including albedo (not possible to calculate from nighttime imagery).

Table 3 Days of air temperature measurements and corresponding Landsat ETM+ data

For the purpose of this study, radiometrically and geometrically corrected (L1T) product is used (Landsat 7 Science Data Users Handbook 2010). Atmospheric correction was applied using single-channel algorithm (Jiménez-Muñoz et al. 2009). Initial data processing encompasses conversion of each spectral band from digital numbers to radiance units and then conversion from radiance to reflectance (Landsat 7 Science Data Users Handbook 2010). The whole group of six remotely sensed parameters consists of:

  1. 1.

    Albedo (a, unitless [0, 1]), considered as reflectance for panchromatic band 8 (Landsat ETM+ band 8):

    $$ a = \frac{{\pi \cdot L_{{{\text{PAN}}}} \cdot d^{2} }}{{{\text{ESUN}}_{{{\text{PAN}}}} \cdot \cos \theta _{{\text{s}}}}} $$
    (1)

    where L PAN is the spectral radiance for panchromatic band [Watts per square meter per steradian per micrometer], d the Earth–Sun distance in astronomical units, ESUNPAN the mean solar exoatmospheric irradiances (Landsat 7 Science Data Users Handbook 2010), and θ s the Solar zenith angle in degrees. Negative correlation with UHI is expected.

  2. 2.

    Vegetation indices:

    • Normalized Difference Vegetation Index (NDVI) is modulation ratio of reflectance (ρ) for near-infrared (NIR) and red bands (RED) as it indicates vegetation (Tucker 1979):

      $$ {\text{NDVI}} = \frac{{{\rho_{\text{NIR}}} - {\rho_{\text{RED}}}}}{{{\rho_{\text{NIR}}} + {\rho_{\text{RED}}}}} $$
      (2)
    • Soil Adjusted Vegetation Index (SAVI) is a superior vegetation index for low-cover environments (Heute 1988):

      $$ {\text{SAVI}} = \left( {\frac{{{\rho_{\text{NIR}}} - {\rho_{\text{RED}}}}}{{{\rho_{\text{NIR}}} + {\rho_{\text{RED}}} + L}}} \right)(1 + L) $$
      (3)

      where L is an empirically determined constant to minimize the vegetation index sensitivity to soil background reflectance variation (Schowengerdt 2007). In this case, L is set to 0.5.

    • Normalized Difference Moisture Index (NDMI) that contrasts the NIR, sensitive to the reflectance of leaf chlorophyll content to the mid-infrared band (MIR), sensitive to the absorbance of leaf moisture (Wilson and Sader 2002):

      $$ {\text{NDMI}} = \frac{{{\rho_{\text{NIR}}} - {\rho_{\text{MIR}}}}}{{{\rho_{\text{NIR}}} + {\rho_{\text{MIR}}}}} $$
      (4)

      Vegetation indices are unitless and its range is [−1, +1].

    In Wrocław, deciduous trees dominates over coniferous, so the vegetation indices mentioned above, based on chlorophyll content, are for winter at the same level as for the wooded and built-up areas. Negative correlation with UHI is expected (Szymanowski and Kryza 2011).

  3. 3.

    Land surface temperature (T ls, kelvin) was calculated using emissivity (ε) and at-satellite temperature T as [kelvin] with the single-channel algorithm (Jiménez-Muñoz et al. 2009). Atmospheric parameters were estimated using Atmospheric Correction Parameter Calculator (2010). T as is converted from spectral thermal infrared radiance and is considered as effective at-satellite temperature of the viewed Earth–atmosphere system under the assumption of unity emissivity. The conversion formula is:

    $$ {T_{\text{as}}} = \frac{{{K_2}}}{{\ln (\frac{{{K_1}}}{{{L_{\text{TIR}}}}} + 1)}} $$
    (5)

    where L TIR is the spectral radiance for TIR band [W m−2 sr−1 μm−1] and calibration constants K 1 and K 2 are equal to 666.09 W m−2 sr−1 μm−1 and 1,282.71 K, respectively (Landsat 7 Science Data Users Handbook 2010).

    Uncorrected T ls is equal to T as on the assumption that ε = 1, so all emitting materials are ideal blackbodies with 100% radiative efficiency. If emissivity of thermal region is known, surface temperature can be calculated more precisely.

    The correlation between the land surface and air temperature changes seasonally (Szymanowski and Kryza 2011). For winter, when artificial build up areas are the warmest, high negative correlation coefficients were calculated (R = −0.90). If the snow cover is observed outside the city center, which is the case of January, air temperature is strongly correlated with land surface temperature. In summer and particularly in autumn, after harvest, when fields are bare, day and night thermal condition of land surface differs significantly in the city outskirts, while built-up areas are warm irrespectively to diurnal cycle.

  4. 4.

    Emissivity (ε, unitless [0–1]) is defined as the ratio of the spectral radiant exitance of a graybody to that emitted by a blackbody at the same temperature (Schowengerdt 2007). In urban areas, the emissivity of typical man-made materials in TIR band of Landsat ETM+ ranges from 0.40 to 0.98 (Stathopoulou et al. 2007). There are numerous techniques to retrieve emissivity from satellite multispectral imagery (Sobrino and Raissouni 2000; Sobrino et al. 2008; Stathopoulou et al. 2007). The method depends on reclassification of the study area due to NDVI values and then separately for three NDVI classes:

    1. a.

      for bare soil, rocks and artificial materials in urban environment (NDVI < 0.2):

      $$ \varepsilon = 1 - {\rho_{\text{RED}}} $$
      (6)

      where ρ RED is reflectance for RED band;

    2. b.

      for vegetated areas (NDVI > 0.5), ε is assumed to be constant and equal to 0.98

    3. c.

      for areas representing mixture of vegetated and non-vegetated surface (0.2 ≤ NDVI ≤ 0.5) formula proposed by Valor and Caselles (1996) is used (after parametrization by Stathopoulou et al. 2007):

      $$ \varepsilon = 0.017 \cdot \frac{{{{({\text{NDVI}} - 0.2)}^2}}}{{{{(0.5 - 0.2)}^2}}} + 0.963 $$
      (7)

Further corrections are applied for the emissivity layer based on land-use map. The emissivity for water areas is often too low (0.90–0.93), therefore all are reclassified to 0.99. Similarly, areas covered by snow in the winter case (a > 0.5) are set to 0.99 (Arnfield 1982).

The seasonal change of sign of correlation coefficients can be observed when analyzing emissivity, with negative correlation in warm season and positive in cold season, if snow cover is present (Szymanowski and Kryza 2011).

4.1.3 LIDAR scan derivatives

LIDAR measurements for the city area were converted to 1-m resolution raster dataset, with buildings, trees and shrubs separated from terrain elevation. Buildings heights (max, min, average) were available from the geodetic database (vector format), and the following information was derived and used in UHI interpolation:

  1. 1.

    Roughness length (z 0; meters). Various procedures for estimation of z 0 are available (e.g., Grimmond et al. 1998; Grimmond and Oke 1999), and here, the formula proposed by Bottema (1997) and Bottema and Mestayer (1998) was used, with the simplification proposed by Gal and Unger (2009):

    $$ {z_0} = h(1 - \lambda_{\text{P}}^{{0.6}})\exp \left( { - \sqrt {{\frac{{0.4}}{{{\lambda_{\text{F}}}}}}} } \right) $$
    (8)

    where h is averaged building height, λ P is plan area ratio, and λ F is frontal area ratio (calculated for eight main wind directions). For the purpose of this study, the algorithm proposed by Gal and Unger (2009) was modified to provide spatially continuous information on z 0. The maximum distance from a building (or a group of buildings) to the border of its lot area is assumed to be maximum 10h, while if not limited, it led to overestimation of lot areas for sparse, low development. The gaps between lot areas and non-built-up areas in the city boundaries are filled with the same values as used in the previous paper (Szymanowski and Kryza 2009) for a given land-use class. The >0 regression coefficient is expected for this predictor (see Section 4.1.1. above for details).

  2. 2.

    Porosity (P, unitless [0, 1]) is a measure of how penetrable the area is for the airflow and could be defined as the ratio of the volume of the open air and the volume of the urban canopy layer referring to the same area. In this case, all calculations were performed for squared lot areas equal to 1 ha (A T = 10,000 m2) with 1-m resolution buildings, trees, and shrubs raster datasets. The formula designed for the porosity of buildings proposed by Gal and Unger (2009) was modified due to influence of trees and shrubs:

    $$ P = {P_{\text{b}}} + {P_{\text{ts}}} = \frac{{{A_{\text{T}}}{h_{\text{b}}} - {V_{\text{b}}}}}{{{A_{\text{T}}}{h_{\text{b}}}}} + (1 - p)\frac{{{A_{\text{T}}}{h_{\text{ts}}} - {V_{\text{ts}}}}}{{{A_{\text{T}}}{h_{\text{ts}}}}} $$
    (9)

    where P b (P ts) is the buildings porosity (or trees and shrubs), h b (h ts) is the mean height buildings (trees and shrubs) in the lot area, V b (V ts) is the sum of volumes of the buildings (trees and shrubs), and p is the porosity index of trees. The value of p is equal to 0.2 when deciduous trees are in leafs and it is set to 0.6 for the leafless period (Heisler and DeWalle 1988). The predictor works in the opposite way to roughness length, and negative correlation with UHI is expected.

  3. 3.

    Sky View Factor (SVF, unitless [0, 1]), defined as the hemispherical fraction of unobstructed sky visible from a given location. Here, the computationally efficient approach based on hillshading algorithm proposed by Corripio (2003) was used to derive spatial information on SVF for the Wrocław area. The solar azimuth and elevation steps were set to 2° for computational efficiency, with 1 m spatial resolution of the digital elevation model. The predictor is related with the geometry of buildings and street canyons, and high values are related with decreased long-wave radiation loss (positive correlation with UHI is expected).

  4. 4.

    Daily sums of solar irradiation (DSI—excluding walls, DSIw—including walls; Watt-hours per square meter), calculated using r.sun model implemented in GIS GRASS system (Šuri and Hofierka 2004; Hofierka and Kaňuk 2009). Sums of daily total solar irradiation for the day preceding nighttime UHI were calculated. The shadowing effects of the nearby buildings were included. Because the r.sun works only with 2D raster elevation layers, the model can be applied specifically to the selected building surfaces—roofs and to the interbuilding areas. The solar energy reaching building walls was also approximated here by setting specific values of aspect, slope, and height to 1-m resolution raster elements representing walls. The aspect was set according to the real orientation of the wall calculated from the vector model, and the slope was set to 90°. The relative height of the wall was set to the half of the real height to account for shadowing effect of the wall due to the surrounding buildings. The shadowing effect of trees was not included. DSI is negatively correlated with air temperature (Szymanowski and Kryza 2011) and can be explained by the strong shadowing effect of the compact development in densely built-up parts of the city. This causes relatively low sums of energy incoming to the areas between the buildings, where the measurements were performed. The idea of DSIw incorporates façade surfaces that can surpass the role of relatively flat terrain and roofs, especially in winter when the sun position is low and is expected to be positively correlated with UHI.

4.1.4 Artificial heat emission

Anthropogenic heat release (Q A, watts per square meter) was earlier estimated by Chudzia and Dubicka (1998) for the Wrocław area based on detailed inventory of energy (electricity and fuel) consumption in the late 1990s. Q A was estimated for non-heating (April to October) and heating (November to March) seasons in various parts of the city and in various land-use classes. Positive correlation with UHI is expected for Q A, regardless of the season.

4.1.5 Spatial predictors’ derivatives

Spatially continuous variables described above were considered as potential predictors of the spatial structure of the UHI. However, spatial gradients of air temperature are smoothed due to air flow and turbulence, and therefore less pronounced, than “sharp” transitions typical of high-resolution satellite imagery and LIDAR data. Moreover, the air temperature in a given location is influenced by thermal conditions of the surrounding areas (source region), with the effective radius of ∼0.5 km (screen level rule-of-thumb; Oke 2004; Szymanowski and Kryza 2009), and depends mainly on building density. To incorporate the source region effect in the interpolation procedure, a set of raster layers for each parameter described above was calculated with the focal mean filter tool. For each raster element, the filter calculates the average of the values within a specified neighborhood of the input raster map. The averaging reduces isolated high values and smoothes sharp gradients in the original high-resolution data. The averaging matrices applied here are circular in shape with radii varying from 25 to 1,000 m.

4.2 Global linear regression model

The global regression model can be expressed by:

$$ {y_i} = {\beta_0} + \sum\limits_k {{\beta_k}} {x_{{ik}}} + {\varepsilon_i} $$
(10)

where y i is the dependent (interpolated variable), x ik represents explanatory (independent, predictor) variables, β 0 is model intercept, β ik are coefficients of linear regression, and ε i is error term at points i (regression residuals). The method can be used to spatialize discrete point data on the assumption that auxiliary, independent variables are known and continuous in space, or, technically, they can be provided as raster layers. MLR has been successfully used for climatological purposes, as well as for UHI spatialization (Svensson et al. 2002; Bottyán and Unger 2003; Vicente-Serrano et al. 2005; Alcoforado and Andrade 2006; Szymanowski and Kryza 2009).

Independent variables were selected for each UHI case from the set of potential predictors described in Section 4.1. Due to proper specification of the regression model from 206 measurement points, the number of independent variables was assumed to be equal or less than five. Selection of the predictors was performed stepwise, taking into account their statistical significance and the lack of colinearity with other independent variables, by analyzing the variance inflation factor (VIF). Also, the direction of dependence (sign of the β coefficient) between air temperature and independent variable was checked to ensure that the final equation can be explained in terms of known physical processes that influence UHI formation. Expected level of confidence was 95% (p value < 0.05), and VIF should not exceed the value of 10, unless the considered variable significantly improve overall regression model. Usually the same sign of β is expected for a given parameter throughout a whole year. However, in some cases, it can change respectively to the meteorological conditions, for example due to snow cover as it was observed for T ls in winter cases (Szymanowski and Kryza 2011).

4.3 Geographically weighted regression model

The MLR method can be applied to spatial data under assumption of spatial non-stationarity and location-independency. In other words, it is assumed that the same stimulus (given as spatial predictor) provokes the same response in all areas of the study region. This is usually hard to meet, especially in the field of meteorology and climatology, where many processes can be considered as spatially unstable. One solution is to substitute the global regression model with the local one. This approach is known as Geographically Weighted Regression (Fotheringham et al. 2002). It is a non-parametric model of spatial drift that relies on a sequence of locally linear regressions to produce estimates for every point in space by using a subset of information from nearby observations. Mathematically, as the extension of global linear regression model (Eq. 10), GWR can be described as:

$$ {y_i} = {\beta_0}({u_i},{v_i}) + \sum\limits_k {{\beta_k}({u_i},{v_i})} {x_{{ik}}} + {\varepsilon_i} $$
(11)

where (u i ,v i ) denotes the coordinates of the ith point in space, β 0 and β k are parameters to be estimated, and ε i is the random error term at point i. The local estimates are made using weighted regression, and weights assigned to observations are the function of the distance from point i. Larger weights are assigned to observations closer to point i. Therefore, the weighting of an observation is not constant but is a function of geographical location. The role of individual observations is represented by weighting matrix for which it is necessary to choose a weighting scheme. For the continuous processes, a monotone, decreasing function (Gaussian or near-Gaussian; Fotheringham et al. 2002) is appropriated, and the kernel functions are suggested for constructing the weights. The spatial kernel functions can be divided into two categories: fixed and adaptive. A fixed spatial kernel function is thought as the optimum spatial kernel (represented by the spatial bandwidth) which is determined and applied uniformly across the study area. The alternative is spatially adaptive kernel function, which would have different bandwidths (distances), determined by the constant number of observations to retain within the weighting kernel “area,” irrespective of distance. Adaptive kernel function is able to adjust to randomly spaced measurements—the kernels have larger bandwidths in areas where the data are sparse and smaller where the data are densely distributed (Fotheringham et al. 2002).

For retaining the comparability of both global and local regression models, the same explanatory variables, as specified for MLR, were used in GWR model. Calibration of the local regression model in a deterministic manner included selection of the kernel type, size of bandwidth, and verification that the local model can be physically explained over the whole study area (regression coefficients change in space). Due to irregular distribution of the sampling points, the adaptive kernel (Gaussian shape) was used. The optimal bandwidth size was determined with iterative procedure, and all bandwidth sizes exceeding 20 points, which was recognized as the minimum for proper calibration of the local regression model for five independent variables, were tested for expected physical relation between UHI and the predictor, expressed with regression coefficient. The analysis was repeated for all considered UHI cases and the following diagnostic measures were calculated:

  1. 1.

    AICc—corrected Akaike Information Criterion (Hurvich et al. 1998), which is a measure of model performance and is helpful for comparing different regression models. Taking into account model complexity, the model with a lower AICc value provides a better fit to the observed data;

  2. 2.

    Estimated standard deviation for the residuals (σ). The models with smaller values of this statistic are preferable.

  3. 3.

    Global and local minimum and maximum determination coefficients (R 2) as measures of goodness of fit.

Last question that should be answered in the process of local regression model specification is whether each set of parameter estimates exhibits significant spatial variation over the study area. The reason is that if localized parameter estimates do not meet statistically significant differences, the GWR model can be considered as equivalent to the global regression model, although the local parameter estimates show spatial variation. Moreover, if any independent variable shows spatial stationarity by the test, a mixed GWR model may be more appropriate (Fotheringham et al. 2002; Yu and Wu 2004; Yu 2006). In this paper, two tests were employed to address the issue: Monte Carlo (Charlton et al. 2010) and F3 (Leung et al. 2000). Both tests are applied for verification weather the local model (GWR) offers an improvement over the global model (MLR). For Monte Carlo approach, the significance of variability of individual coefficients is tested—for a given number of times, the geographical coordinates of the observations are randomly permuted against the variables, resulting in n values of variance of the coefficient of interest, which are used as an experimental distribution. The actual value of the variance is compared against this list to calculate an experimental significance level. Analytical F3 method is less computationally intensive than Monte Carlo algorithm, and it tests the variability of the variance under a null hypothesis of a stationary coefficient. The detailed equations are provided by Leung et al. (2000).

4.4 Comparison of regression models

Both models, MLR and GWR, for all analyzed UHI cases were evaluated and compared using a set of statistics to check a goodness-of-fit of the models to the observations. Additionally, spatial autocorrelation of regression residuals (Moran’s I statistics) was analyzed to detect possible problems with proper specification of the model in the non-stationary conditions of the spatial processes.

To test whether the GWR model offers an improvement over the MLR model, an analysis of variance (ANOVA) was used (Fotheringham et al. 2002). The analysis of variance is used here to compare MLR and GWR models by providing a statistical test of whether the means of residuals of both models differ significantly or not. The ANOVA tests the null hypothesis that the GWR model represents no improvement over a MLR model, using the F test. As pointed out by Leung et al. (2000) and Fotheringham et al. (2002), the GWR model certainly fits a given dataset better than a global MLR model. However, in practice, a simpler model is usually preferred over a more complex one if there is no significant improvement from the latter, and this was addressed with ANOVA.

4.5 Extension of the local and global models by interpolation of the residuals—the residual kriging approach

In the regression methods described above (Eqs. 10 and 11), there is a part of variance that is not explained by the model (residuals, ε i ). In terms of interpolation, it means that regression methods are inexact interpolators. The regression residuals are assumed to be randomly distributed. The RK algorithm sums up the trend component (deterministic, explained by the regression model) and residuals (stochastic part) interpolated with kriging (usually ordinary kriging, OK) technique:

$$ {\text{R}}{{\text{K}}_{{{T_i}}}} = {y_i} + {\varepsilon_i} $$
(12)

where \( {\text{R}}{{\text{K}}_{{{T_i}}}} \) is air temperature calculated with the residual kriging for the grid i, y i is the air temperature calculated with the regression model (MLR or GWR, Eqs. 10 and 11, respectively), and ε i is the regression residual (for MLR or GWR model) interpolated spatially with the OK approach. The prediction of the OK is a weighted linear combination of the available data. Linear coefficients (λ) are calculated under the condition of a uniformly unbiased predictor and under the constraint of minimal prediction error variance. The OK predicted value ε i for the location i can be expressed as (Cressie 1991; Wackernagel 2003):

$$ \varepsilon_i = \sum\limits_k {{\lambda_k}{\varepsilon_k}} $$
(13)

While GWR can be used for spatial interpolation of climate elements (Lloyd 2007), the novelty of this paper is the introduction of geostatistical (kriging, OK) interpolation of GWR residuals that has not been implemented before for climatological purposes. The composition of GWR and OK of residuals is named here as geographically weighted regression (residual) kriging (GWRK).

4.6 Evaluation of the interpolation results

Interpolation results derived with spatialization techniques applied in this study (MLR, GWR, RK, and GWRK) were evaluated and compared using CV approach in which a single observation is removed from the original sample dataset (“leave-one-out” method) to be used as the validation data and the remaining observations are used for interpolation. The procedure is repeated consecutively for all measuring sites and the interpolation errors are calculated as the difference between the modeled and the observed values. The CV errors were used to calculate diagnostic measures, including mean bias (BIAS), the root mean square error (RMSE), the mean absolute error (MAE), and maximum and minimum errors (Willmott and Matsuura 2006). Statistical validation of the interpolation algorithms was complemented by visual analysis of the UHI spatial patterns and spatial distribution of the cross-validation errors.

Spatial pattern of CV errors is also visualized on the final maps of air temperature (section 5) to express the following information:

  • The magnitude and sign of CV errors, which has been previously standardized and classified to be comparable between cases. This is symbolized by the proper symbol size and shape.

  • The tendency for clustering of high or low CV errors was described with the Local Moran’s index (Anselin 1995) and marked on maps. This is done because the quality of the spatial model can be evaluated also in terms of clustering tendency of errors (Fotheringham et al. 2002). Significant spatial tendency for clustering of very high or low errors (outliers) suggests the model misspecification in the region of clustered errors. The spatial relations among the features in Moran’s index calculation were conceptualized by the inverse distance method with threshold set to the maximum distance of the first neighbor (∼1,160 m). Due to the irregular distribution of sites, the row standardization was used to generate spatial weights matrix. The Moran’s index is used here as quantitative measure of local spatial autocorrelation of CV errors. The white or black filling of the symbol distinguishes between a statistically significant (0.05 level) cluster of high values (HH) or cluster of low values (LL). The gray color indicates either not significant cluster/outliers process or significant outlier in which a high value is surrounded primarily by a low (HL) value or a low value is surrounded primarily by a high (LH) value.

5 Results

5.1 Regression models

Statistical summary of the MLR models is given in Table 4. For all analyzed UHI cases, the global models were able to explain over 70% of the variance. If the results are compared with the previous work of Szymanowski and Kryza (2009), there is a considerable increase of the explained variance for one UHI case of 3 January 2002 (Table 4). The remaining cases are characterized by similar or slightly higher determination coefficients, what leads to the conclusion of a significant role of precise land-use map, in the lack of 3D database and remotely sensed data, for the UHI spatialization process.

Table 4 Regression analysis and spatial autocorrelation of regression residuals for the MLR model (for all UHI cases n = 206)

The process of GWR model specification was started by calibration of optimal kernel size. All analyzed UHI cases show similar behavior when changing the kernel size, so the results are shown with the example of 22 May 2001 case (Figs. 2, 3, and 4).

Fig. 2
figure 2

Local regression parameter estimates (β) for various kernel sizes: a artificial heat emission (Q A), b porosity (P), c SAVI, d DSI, and e roughness length z 0 for 22 May 2001

Fig. 3
figure 3

Local regression statistics: a corrected Akaike Information Criterion, b estimated standard deviation for the residuals, and c global and local minimum and maximum determination coefficients for 22 May 2001

Fig. 4
figure 4

Cross-validation errors: a GWR and b GWRK for 22 May 2001. MAE mean absolute error, RMSE root mean square error)

Taking into account AICc and σ, it can be stated that, with the decrease of the kernel size, the better fit of modeled data to observations is achieved, quantified by decreasing values of AICc and σ. This is also supported by the changes in global R 2 that increase with the decreasing kernel size. Together with decrease of kernel size, the increase of local R 2 variance is observed, usually due to changes in minimum R 2 (Table 5).

Table 5 Regression analysis and spatial autocorrelation of regression residuals for the GWR model

The analysis of changes in local parameter estimates (β) showed a similar tendency as in the case of R 2: the smaller kernel the greater variance of β, but also the model is less biased. This issue is known as the bias-variance trade-off because, if regression coefficients vary continuously over space and using weighted least-squares regression, it is unlikely to provide a completely unbiased estimate of β at a given point. However, zero bias does not guarantee an optimal estimator (Fotheringham et al. 2002). Moreover, for all analyzed UHI cases, we found at least one independent variable for which the use of small bandwidths causes the change of sign of the local β estimate. This means that in some parts of the study area, the model does not properly describe the physical processes affecting the air temperature field, and the given predictor is locally not statistically significant (β equal to zero). For example, in the case of 22 May 2001, the changes of β sign were observed for three independent variables: P, DSI, and z 0 at adaptive kernel of 49, 39, and 28 points, respectively (Fig. 2). For this reason, the optimum adaptive kernel size cannot be defined simply by selecting the best R 2, AICc, or σ statistics, but other factors should also be considered. Therefore, the kernel size should be as small as possible and:

  • Not smaller then assumed minimum of points required for proper local model calibration (20 points in case of this study);

  • The smallest possible for which the β values do not change sign over the study area and therefore can be physically explained.

The resulting kernel sizes that meet these conditions are summarized in Table 5.

The analysis of Moran’s I index of GWR residuals showed significantly better specification of GWR model in comparison to MLR. For the MLR model, a spatial autocorrelation of regression residuals was detected for all UHI cases, suggesting the misspecification of the model due to the non-stationarity of the spatial process (Table 4). After the implementation of GWR, statistically significant tendency for clustering of similar residuals was observed only for 15 February 2002 (Table 5).

The straightforward comparison of MLR and GWR models in each UHI case was performed by the analysis of AICc values, global determination coefficient R 2, and ANOVA approach. The analysis of AICc revealed a better fit of the GWR model to observations than that for MLR. This was also supported by the analysis of global determination coefficients that showed in all cases the increase of air temperature variation explained by the model up to 79–88% and, locally, 91%, leaving only in two cases (26 June 2001, 3 January 2002) about 9% of variation unexplained (Table 5). In general the GWR estimates significantly reduced the residual sum of squares over the MLR estimates (Table 6). The tests were statistically significant (F statistics) and showed that GWR model performed better than MLR.

Table 6 Comparison between MLR and GWR model—ANOVA test

In the last step of GWR model calibration, non-stationarity tests for regression parameters were performed (Table 7). The results of two tests are generally similar, but the existence of spatial instability was detected for some parameter estimates by both tests. It could lead to the conclusion that for four out of seven analyzed cases, a mixed GWR model may be more appropriate or some independent variables should be excluded from the model, which is not the case in this study.

Table 7 Monte Carlo and F3 tests for local parameters estimates non-stationarity

The results presented so far suggest that for all analyzed UHI cases, local geographically weighted regression models are able to significantly better describe the UHI structure than global, ordinary least-squares models, and there are strong statistical basis that supports the application of local over global regression models for UHI spatialization.

5.2 Spatial interpolation

The GWR and MLR models, together with their extension by kriging of residuals, have been applied for interpolation of UHI. The interpolation results are evaluated with the cross-validation procedure to quantify the possible interpolation and extrapolation error for each approach.

For all interpolation algorithms, the mean error (BIAS) is close to zero; therefore, there is no general tendency towards under- or overestimation. The GWR algorithm usually results in small positive BIAS, which is reduced after adding the interpolated residuals in GWRK method (Table 8). The MAE and RMSE statistics show better performance of GWR than MLR for all analyzed UHI cases (Table 8). However, for all cases, incorporation of stochastic part to GWR or MLR results (interpolation of residuals) resulted in significant decrease of CV errors, and MAE and RMSE are similar for GWRK and RK. Comparing the extremes of CV errors, it can be observed that the largest minima and maxima and thus the range of cross-validation errors are calculated for MLR. GWR is characterized by reduced extremes and range, and the smallest ones are produced by a combined approach of RK or GWRK (Table 8).

Table 8 Cross-validation results for the selected interpolation methods

Two selected UHI cases are illustrated on maps, representing two groups of UHI, generated under various wind conditions, which was found to be one of the most important factors generating spatial non-stationarity and influencing the spatialization results:

  • UHI generated during weak winds, with UHI circulation (22 May 2001; Table 2, Fig. 5),

  • UHI generated and shifted to the leeward side due to weak but stable regional winds (3 January 2002; Table 2, Fig. 6).

Fig. 5
figure 5

Air temperature [degree Celsius] and standardized cross-validation errors (circles for negative and squares for positive errors) together with statistically significant clustering tendency of high (white fillings) or low (black fillings) values (gray fillings indicate either not significant cluster/outliers process or significant outlier in which a high value is surround primarily by a low or a low value is surrounded primarily by high values) for 22 May 2001

Fig. 6
figure 6

Air temperature [degree Celsius] and standardized cross-validation errors (circles for negative and squares for positive errors) together with statistically significant clustering tendency of high (white fillings) or low (black fillings) values (gray fillings indicate either not significant cluster/outliers process or significant outlier in which a high value is surround primarily by a low or a low value is surrounded primarily by high values) for 3 January 2002

For 22 May 2001 (Fig. 5), the application of the MLR model resulted in underestimations over the NE part of the city and in the northern part of the “peak” zone of the UHI. In contrary, MLR resulted in overestimations in S-SW part of the city. This tendency is decreased while using GWR but is not entirely removed. In the case of 3 January 2002, MLR is not able to detect exterior factor, i.e., the regional NE-ENE wind, which can be recognized by the analysis of spatial distribution of CV errors and results in the underestimations in southern parts and overestimations in northern part of the city (Fig. 6). The GWR model works better (cross-validation errors are smaller than for MLR) in this case and improves the quality of the interpolation, but still does not fully recognize the role of wind on shift of the UHI structure.

It should to be stressed that the incorporation of stochastic part in the process of spatialization (RK, GWRK) improves the cross-validation results both from the quantitative and visual point of view. Statistically significant tendency to cluster similar CV errors is actually eliminated, and the zones of over- or underestimation do not longer exist (Figs. 5 and 6).

6 Summary and conclusions

For the purpose of this study, a set of new potential predictors of the UHI was derived from satellite imagery (Landsat ETM+) and 3D LIDAR-originated database. Computationally intensive derivatives of these variables, including: daily sums of solar irradiation, roughness length, porosity, sky view factor, or land surface temperature did not improve the global regression model, compared to the results published earlier by Szymanowski and Kryza (2009). This leads to the conclusion that in the lack of 3D database and remotely sensed data, land-use map and its derivates are sufficient for spatial interpolation of UHI. The gain from applying more complex independent variables was significant here only in one UHI case out of seven.

With the given set of spatially continuous UHI predictors, it was not possible to propose one general regression model, build on universal subset of independent variables. This is because for various mesoscale meteorological conditions or seasons, various factors may be responsible for the spatial pattern of UHI.

The analysis of spatial autocorrelation of regression residuals with the Moran’s I index showed a statistically significant tendency for clustering of MLR residuals that meant the model was misspecified due to the non-stationarity of the spatial process for all UHI cases. For the GWR model, such misspecification was observed only in one case. That is the reason why, dedicated to non-stationary processes, local regression techniques should be used for the analysis of meteorological phenomena like UHI. This conclusion is also supported with other tests used to compare the MLR and GWR models, including AICc, R 2, and ANOVA. However, the Monte Carlo and F3 tests for the significance of spatial variance of local β estimates pointed out that for two cases analyzed, a mixed GWR–MLR approach could be justified.

Locally weighted regression model (GWR) was built using the same independent variables that were used for the global MLR model, and the basic assumption was to retain, in all subareas of the city, the possibility of physical interpretation of the model (the deterministic regression model). Due to irregular distribution of the sampling points in space, the adaptive kernel (Gaussian shape) instead of the fixed one was used. The bandwidth size was selected with the iterative procedure including the analysis of corrected Akaike Information Criterion, standard deviation for the residuals, global and local determination coefficients, and local parameter estimates. The objective when choosing optimum bandwidth size was to keep it as small as possible and assure that for the entire study area, the final GWR model is physically explainable for all independent variables. The physical correctness of the regression equation is crucial if the model is applied to derive air temperature over the areas not covered with measurements, i.e., used for extrapolation.

Comparing the spatialization results achieved by the MLR and GWR techniques, one should stress that despite the maps looks similar, the latter has a strong advantage in better recognition of non-stationarity characteristics of spatial process, which was proved with various statistics above. Generally, MLR assumes constant relationships with land-use and remotely sensed derivatives, while GWR is dedicated to perform locally and the combination of local models gives better fit to observed data when an external, non-stationary process is noticeable. The goodness-of-fit of the GWR model is the function of the kernel size: the smaller the kernel, the better fit is expected. There are two main reasons limiting decreasing the kernel size. First is statistical: too many independent variables for too few observations leads to the misspecification of the model. Secondly, the physical interpretation of the local model is often lost for a given predictor if too small kernel size is selected. The main assumption of our model was to assure proper deterministic relations over the study area. It was also shown, by comparison of the current and previously published results, that the incorporation of the more advanced spatial predictors does not necessarily lead to the improvement of the interpolation results, expressed in terms of cross-validation errors. The GWR and MLR results can be significantly improved by adding the stochastic part of the process, i.e., interpolation of the regression residuals RK and GWRK procedures. The results of those procedures are similar while comparing the CV errors statistical characteristics and spatial distribution. The main reason that is decisive in recognizing GWRK as the most proper method is its statistical correctness due to unexplained (by independent variables) and non-stationary phenomena.