In order to minimize the effect of H, LAT and LON to the multi-temporal LST dataset a short of data transformation is required in order to produce a new set of images that should present high correlations to the three independent variables under consideration (Miliaresis 2012a). In this context, principal components analysis (PCA) is a linear transformation technique that produces a set of images known as principal components (PCs) that are uncorrelated with one another and are ordered in terms of the amount of variance (eigenvalues) they explain from the original image set (Jolliffe 2002). In previous research efforts (Miliaresis 2012a, b, c, 2013, 2014a) principal components analysis (PCA) is applied, by computing the eigenvalues and eigenvectors from the cross-correlation matrix of LST data. So, the resulting eigenvalues and eigenvectors correspond to the standardized LST data (each month presents mean equal to zero and standard deviation equal to 1). Then, PCs are computed from the linear combination of eigenvectors and the corresponding pixel values of the initial images (Mather and Koch 2011). Finally, linear regression models are applied to the first two PCs and spit the variance associated to H, LAT and LON that is included in the predicted images (Miliaresis 2012a, b, c).
ANOVA table for each regression verify the statistical significance (Miliaresis 2013, 2014a). The model performance is further assessed by the R2 (R is the multiple correlation coefficient between the independent variables and the dependent variable) that represents the extent of variability in the dependent variable explained by all the independents variables (Landam and Everitt 2004). Then, the multi-temporal data set is reconstructed by considering the residual images for the first 2 PCs as well as the later PCs.
Study area
The study area (Fig. 1) is bounded by longitudes −124° to −112° (West) and latitudes 32° to 44° (North) and includes the states of California, Nevada, Utah and Arizona. NW of the study area (California), the climate is characterized by moderately cold winters with heavy snowfall on the mountains (Sierra Nevada Ranges) and warm, very dry summers with limited rainfall, especially in the south (Wang and Gillies 2012).
The central and the eastern part of the study area (Nevada, Arizona and Utah) is mostly formed by a series of parallel mountain ranges intervening flat basins (Fig. 1). The climate is generally semi-arid or arid with warm summers and cold winters but this varies by location and elevation (Wang and Gillies 2012) since some mountainous areas are high enough in elevation to experience an Alpine climate. The majority of streams and rivers flow into desert sinks or closed-basin lakes while Colorado River crosses Grand Canyon in SE (Barnett and Pierce 2008). Southerly, the study area is occupied by Mojave Desert (California) and by Sonoran Desert (Arizona).
Data
The SRTM30 digital elevation model (DEM) (Farr and Kobrick 2000; SRTM30 DEM 2015) with spatial resolution equal to 0.00833 degrees (approximately 1 km at the equator) provides the elevation representation of the study area (Fig. 1). A geographic latitude/longitude grid is used with WGS 84 as horizontal datum (SRTM30 DOC 2015). The elevation range is in between −83 and 4097 m. Negative elevations (below sea level) are observed in Death Valley, (California).
Around local solar time 01:30, 10:30, 13:30 and 22:30 LST data are acquired from the MODIS instrument on board the Aqua and Terra polar orbiting satellite (MYD11C2 2011). The LST accuracy according to Wan (2014) is better than 1 K (0.5 K in most cases) under real clear-sky conditions. Data from real clear-sky conditions within a calendar month are averaged to yield the MYD11C2 for Aqua and MOD11C3 for Terra products (Wan 2013). MYD11C2 and MOD11C3 data sets provide a continuous (monthly averaged LST) sampling of the earth’s surface with a spatial resolution of 3 min (0.05° corresponding to 5.6 km approximately at the equator) referenced to a geographic latitude/longitude grid, with WGS 84 being the horizontal datum (Wan 2013).
A major improvement has been applied to the MODIS data products since a new processing algorithm is applied (Wan 2013) and the version 6 data products (Wan 2014) are gradually released to the public (MYD11C2.006 2016). The Aqua MODIS night (acquired daily on 01:30) monthly averaged LST data is used. The 12 night monthly averaged LST images for 2007 are visualized in Fig. 2. The monthly LST frequency distributions are presented in Fig. 3.
The data might be also represented by the cross correlation matrix in Table 1 that indicates a rather season dependent correlation of LST with H, LAT and LON.
Table 1 Cross correlation matrix
Density slicing of the November LST image is presented in Fig. 4. The one slice includes the pixels with LST <0 °C while the coastal regions include the pixels with LST >0 °C.
SVR software implementation
SVR is a modular, flexible, open-source GNU Octave 4.0 script for selective variance reduction of multi-temporal data sets. The SVR script, the supporting functions, the visualization scripts as well as the study area data are freely available under GNU Version 3 General Public License. A website has been set-up to facilitate the distribution (https://sourceforge.net/projects/selective-variance-reduction/). Various scripts and functions are included that allow alternative processing options and visualizations.
There are 48,224 data (land) pixels in the study area. The data files are stored in a single Matlab file, under the name california.mat that includes 4 matrices. The vector representation is selected, and so each data element (vector) includes the 12 monthly LST values of a specific pixel. Thus, there are 48,224 vectors (rows) in the LST matrix and 12 columns corresponding to the monthly averaged night LST from January to December 2007. The LST vectors are visualized in Fig. 5. There are 3 more one-dimensional matrices, named H, LAT and LON that correspond to the elevation, latitude and longitude per vector (pixel).
The main script is termed SVR.m (Table 2). SVR is an acronym for selective variance reduction, indicating that the variance associated to H, LAT and LON is subtracted from the multi-temporal LST data.
There are two function-calls in SVR script (Table 2):
-
a.
PCAfunc.m (Table 3) that computes the eigenvalues and eigenvectors from the variance covariance matrix of LST, and
Table 3 PCAfunc.m and normal_equation.m scripts
-
b.
Normal_equation.m (Table 3) that performs the linear regressions of PC1 and PC2 versus H, LAT and LON and computes the two residual images that are used in LST image reconstruction.
The Eig function (Table 3) uses the variance–covariance matrix for the computation of PCs (Eig 2015), thus the reconstructed LST imagery should express thermal anomalies in degrees Celsius. Translation by mean is applied before Eig function is applied (Table 3) in an attempt to improve the accuracy of numerical computations. Translation by mean of the LST variables (instead of using the raw monthly data), use the difference between the variables (monthly averaged LST) and their sample means. Translation does not affect the interpretation because the variances of the original variables are the same as those of the translated variables.
The eigenvectors returned by Eig function (Eig 2015) of OCTAVE are not ordered. That is why in PCAfunc.m (Table 3), the eigenvectors are sorted in a descending variance (eigenvalue) order. The PCs for the LST data of the study area are presented in Table 4.
Table 4 Eigenvectors, eigenvalues and principal components (PCs) of the multi-temporal LST data
In SVR.m script (Table 2), the PCs (Table 4) are computed from the linear combination of eigenvectors and the corresponding pixel values of the initial images.
The contribution of the independent variables (H, LAT, LON) to PC1 and PC2 (dependent variables) is quantified by the linear regression models in Eqs. (1) and (2).
$${\text{PC1}} = 1 3 5. 8 5 1 9 8 - 0.0 1 1 7 5 8\times {\text{H}} - 3. 3 9 3 3 2\times {\text{LAT}} - 0. 2 20 6 7\times {\text{LON}}$$
(1)
$${\text{PC2}} = -1 8 6. 8 8 2 3 + 0.000 4 80 7 9\times {\text{H}} - 0. 3 60 2 3 2\times {\text{LAT}} -1. 4 7 7 1 7\times {\text{LON}}$$
(2)
The R2 (Table 5) indicates the amount of variance explain by the multiple lineal regression models (Landam and Everitt 2004).
Table 5 ANOVA tables for Eqs. (1 )and (2) (df degrees of freedom)
The analysis of variance (ANOVA) tables for Eqs. (1) and (2) are presented in Table 5. The F-statistic and the t-statistic (Landam and Everitt 2004) combined are used in estimating, (a) the success of the regression models and (b) for adding or deleting variables (the significance of independent variables) respectively. The F-test value for Eq. (1) indicates the overall significance of the regression, since it far exceeds the F-critical value (26.12) at the 0.01 significance level (Table 5). For Eq. (1), the coefficients for H, LAT and LON (Table 5) express the individual contribution of the independent variable to PC1. The absolute values of the t test for the 3 independent variables (Table 5) far exceed the t-critical value (2.58) at two tailed 0.01 significance level and hence their coefficients depart significantly from 0.
The ANOVA table (Table 5) for Eq. (2) also verifies the overall significance of the multiple regression model for PC2, as well as the significance of the 3 independent variables.
The SVR.m script reconstruct the PC scores (Scores 2) in Table 2, by considering the regression residuals of PC1 and PC2 plus the PC3 to PC12 components. Then the inverse transformation (Table 2) computes the reconstructed LST (RLST) images from the multiplication of Scores2 matrix times the transpose matrix of eigenvectors (Table 4).
The 12 night monthly averaged reconstructed LST (RLST) images for 2007 are visualized in Fig. 6. The RLST frequency distribution per month are presented in Fig. 7 while the RLST vectors are presented in Fig. 8. Descriptive statistics (Landam and Everitt 2004) for the RLST frequency distributions are available in Table 6.
Table 6 RLST descriptive statistics (thermal anomaly in degree Celsius)