Background

The goal of this paper is to set the theoretical framework of PCA in remote sensing of burned areas. Specific objectives are to introduce the reader to the (i) EVD-based PCA and the significance of mean-centering and scaling as pre-processing steps; (ii) SVD, an alternative solution to PCA and its difference against EVD; (iii) existing EVD- and SVD-based PCA applications in remote sensing; (iv) the methodological concepts of remotely sensing burned areas via the EVD-based PCA; (v) four SVD-based PCA versions for burned area mapping (vi) the implications of mean-centering and scaling multi-dimensional data prior to PCA. Finally we link the presented theoretical concepts to the remote sensing of burned areas and suggest that a non-centered SVD may perform better than the EVD-based PCA in capturing burned areas.

Principal components analysis

Principal components analysis (PCA) or transformation (PCT) is a non-parametric, orthogonal linear transformation of correlated variables [1, 2]. Being probably the oldest and most well-known multivariate analysis technique, PCA is useful in a wide range of applications including data exploration and visualisation of underlying patterns within correlated data sets; decorrelation; detection of outliers; data compression; feature reduction; enhancement of visual interpretability; improvement of statistical discrimination of clusters; ecological ordination; and more [3].

While the transformation can expose the internal structure of multivariate data sets, it is by no means optimised for class separability. PCA does not analyse class labels but uses global statistics to derive the transformation parameters [4]. Hence, there is no guarantee that the directions of maximum variance enhance class separabilities. It is up to the user to identify a high signal-to-noise ratio, via visual or quantitative inspection, of a feature of interest within the principal components. In short, PCA supports cluster seeking applications but cannot replace the need for user input.

Applying the EVD-based principal components transformation (henceforth noted as PCA) assumes linearity (or high data correlation); considers the statistical importance of the mean and the variance-covariance; and bases on large variances to highlight useful discrimination properties. In its classical form, the algorithm of the transformation bases on EVD and performs the following steps: i) organises the data set in a matrix, ii) mean-centers the columns of the matrix (henceforth also referred to as mean centering or just centering 1), iii) calculates the covariance matrix (non-standardised PCA) or the correlation matrix (standardised PCA, a step also known as scaling), iv) calculates the eigenvectors and the eigenvalues of the data’s variance-covariance or the correlation matrix, v) sorts the variances (i.e. the eigenvalues) in decreasing order, and finally vi) projects the original dataset signals into what is named Principal Components, or scores, by multiplying them with the eigenvectors which act as weighting coefficients.

Mathematically, PCA can be described as a set of p-dimensional vectors of weights or loadings w (k)=(w 1, …, w p )(k) that map each row vector x i , of a zero-mean matrix X, to a new vector of principal component scores t (i)=(t 1, …, t m )(i). The scores are given by t k (i)=x (i)·w (k) for i=1, …, n and k=1, …, m in a way that the individual variables of t capture successively the maximum possible variance from x, with each loading vector w constrained to be a unit vector [3]. The complete decomposition of X can be given as

$$ Y=W'X $$
(1)

where W is a p-by-p matrix (noting that W is transposed) whose columns are the eigenvectors of XX determined by the algorithm and Y the matrix of the transformed data. In the case of remotely sensed multispectral images, we consider n spectral response observations (pixel values) on b spectral bands. Following the steps outlined above, the algorithm using the covariance matrix

  1. 1.

    Starts with arranging the spectral responses as n vectors x 1, …, x n and placing them as rows in a single matrix X of dimensions n×b where the columns correspond to the b spectral bands.

  2. 2.

    For each band b, its mean m is subtracted from all of its spectral response values. Hence all bands present now a zero mean.

  3. 3.

    Next, it calculates the covariance (or correlation) matrix Σ x of the input data given by

    $$ \Sigma{}_{X}=\frac{\sum\limits^{n}_{k=1}(x_{k}-m)(x_{k}-m)'}{n-1} $$
    (2)

    We emphasize that calculating the covariance matrix, does actually mean-center the data beforehand. It is of explanatory importance, however, useful to present it as a separate step. The covariance describes the scatter or spread of the spectral responses in the multispectral space. It is symmetric and therefore has orthogonal eigenvectors and real eigenvalues. Alternatively, to perform PCA on the correlation matrix, instead of the covariance matrix, each band is standardized, i.e. divided by its standard deviation.

  4. 4.

    Successively, the algorithm computes the matrix D of eigenvectors which diagonalises the covariance matrix Σ x . The respective equation is

    $$ \Lambda=D{\prime}\Sigma_{x}D $$
    (3)

    where Λ is the diagonal matrix of eigenvalues of Σ x . It is shown that the covariance matrix of the transformed data Σ y can be identified as the diagonal matrix of eigenvalues of Σ x [5]. The relationship between Σ y and Σ x is

    $$ \Sigma_{y}=D{\prime}\Sigma_{x}D $$
    (4)

    where Σ y the covariance of the transformed data, which must be diagonal, and D the matrix of eigenvectors of Σ x , provided that D is an orthogonal matrix.

  5. 5.

    The last step sorts the eigenvectors in D and the eigenvalues in matrix Λ in decreasing order. The eigenvector with the highest eigenvalue, which corresponds to the highest variance, is the first principal component of the data.

Mean-centering, scaling and the eigendecomposition

Mean centering the multivariate data matrix is considered to be integrated in EVD. It is achieved by subtracting the mean of each variable from its own observations. This results in a zero mean vector of observations [6] and successively a zero mean matrix. Graphically, this action translates to an origin shift of the original scatter plot in its gravity center without altering its shape (Fig. 1). This is justified in terms of finding a basis that minimises the mean square error of approximating the original data.

Fig. 1
figure 1

Conventional mean-centered EVD-based PCA on the covariance matrix of a bi-variate data set composed by a post- and a pre-fire MODIS surface reflectance band 2. Black dots and grey asterisks are respectively samples of burned surfaces and vegetation. The original coordinate system (light greyed) has been shifted after centering. The point-scatter has been rotated after PCA to the direction in which the spread of the data onto the new axes (dashed axes, PC1 and PC2) is maximised. These are restricted to be orthogonal

Scaling the centered variables to have unit variance before the analysis takes place is optional. This action, also referred as standardisation, forces the variables to be of equal magnitude by altering the range of the point swarm (Fig. 2). This is required when variables are measured in different units and may be useful when their ranges vary substantially. We note that scaling uncentered variables does not result in unit variance. The usefulness of this combination may even be questionable from a matehematical point of view. However, we include this combination in our analytical framework for the sake of experimental completeness.

The eigendecomposition of a symmetric matrix yields eigenvectors. Each eigenvector has a positive eigenvalue. Theeigen vectors define the direction of the variation and the eigenvalues define proportionally the length of the axes of variation [7]. Geometrically, the first principal component points in the direction with the largest variance. The second component; being orthogonal to the first, points to the second largest variance. The same pattern is repeated for all remaining components.

Fig. 2
figure 2

Conventional mean-centered EVD-based PCA on the correlation matrix, thus scaled, of a bi-variate data set composed by a post- and a pre-fire MODIS surface reflectance band 2. Black dots and grey asterisks are respectively samples of burned surfaces and vegetation. The original coordinate system (light greyed) has been shifted after centering. The shape of the point-scatter has altered after scaling and rotated after PCA to the direction in which the spread of the data onto the new axes (dashed axes, PC1 and PC2) is maximised. These are restricted to be orthogonal

PCA rotates actually the point-scatter around its centroid and aligns to the coordinate axes so as to maximise the spread of the data projected onto them. It may be for instructive reasons that authors refer to PCA as a rotation of the coordinate axes [8]. This spread is the sum of squares of the data’s coordinate points along the axes, also known as principal component scores. In addition, the new axes are not correlated. The scores individuated by centered data are expressed as deviations from the mean of the variable. The rotation translates mathematically in a weighted linear combination of the original dimensions.

SVD, an alternative solution to PCA

Another matrix decomposition algorithm which, among other uses, computes the least squares fitting of data, is the Singular Value Decomposition (SVD). The algorithm yields the minimum number of dimensions required to represent a matrix or linear transformation. Clark and Clark [9] describe SVD as being essentially equivalent to a least squares method but numerically more robust according to [10]. Wu [11] notes that better numerical properties are important when higher order dimensions are required. Specifically, when computing the covariance matrix (equation 2), prior to EVD, it is the multiplication of the matrix X by itself that may lead to loss of numerical accuracy. EVD, upon which the conventional PCA is based, and SVD have similar properties and products. Furthermore, PCA can be seen as SVD applied on a column-centered data matrix [6]. Both algorithms are non-sequential and extract hidden variables simultaneously. Yet they differ in several aspects. EVD works on the covariance (or correlation) matrix while SVD operates on the raw data matrix. The main difference of interest, in this work however, is that centering the columns of the input data is performed by default within the framework of EVD-based PCA, while it is optional in SVD.

To describe SVD mathematically, let X denote a matrix of n observations by p variables. The singular value decomposition of X is

$$ X=ULA{\prime} $$
(5)

where U, A are respectively n×r and p×r matrices, each of which have orthonormal columns so that UU=I r and AA=I r ; L is an r×r diagonal matrix and r is the rank of X. The columns of U are orthogonal unit vectors of length n called the left singular vectors of X. The columns of A are orthogonal unit vectors of length p called the right singular vectors of X.

Jolliffe [3] states that SVD provides a computationally efficient method of actually finding PCs and that it provides additional insight into what a PCA actually does. He proves the relation between SVD and PCA in that

$$ ULA{\prime}=X{\sum\limits_{x=1}^{p}}a_{k}a{\prime}_{k}=X $$
(6)

where a k , k=(r+1), (r+2), …, p, are eigenvectors of XX corresponding to zero eigenvalues. Essentialy, the right singular vectors A of X are equivalent to the eigenvectors of XX, while the singular values in L are equal to the square roots of the eigenvalues of XX. These correspond to the coefficients and standard deviations of the principal components for the covariance matrix.

Applications in remote sensing

Change detection via PCA

In remote sensing studies, PCA is among the most common change detection techniques [12]. The first newly derived components hold the variance related to unchanged landscape features, while succeeding components may feature an enhanced variance of the changed features. Early works demonstrate the usability of PCA in remote sensing ([1320]). Multi-spectral bands treated as columns build up a data matrix, which is then centered and whose variance-covariance (or correlation) matrix is the basis for the components extraction algorithm. In addition, past literature on the subject, includes various explorations such as using unitemporal [17, 21, 22] and multi-temporal [18, 21, 2328] multi-spectral data sets. These studies clearly depict multi-temporal approaches as more effective in terms of both qualitative and quantitative accuracy.

Other studies consider issues related to computational efficiency, disk storage capacities, and more. For instance, spectral contrast mapping–also known as selective PCA–uses two bands that are only weakly correlated to each other as an input and produces two components from which the second summarizes the changes [23, 25, 29]. Harsanyi and Chang [30] developed a method called noise-adjusted PCA in which they suppress undesired spectral signatures in order to maximize the signal-to-noise ratio of a particular signature of interest. The result is a single component identified as a classification image of the spectral feature of interest. Noteworthy in their work, is the fact that centering was not performed before the transformation. Jia and Richards [4] proposed a segmented and possibly multi-staged PCA to deal with computational load issues as well as biases introduced by bands of very high variances. Limitations of this kind have been mostly resolved through the rapid advancement of computer hardware technology allowing researchers to focus more on accuracy issues. For example, [31] demonstrate a multi-temporal and multi-sensor land-use change detection that involves a hybrid classification scheme.

Image processing via SVD

SVD has been extensively used in various disciplines [11, 32]. Its applicability, though, in digital image processing, relies on the fact that multi-spectral remotely sensed spatial data are naturally redundant [33]. Applications include singal estimation in spectral data [34]; classification; data compression and noise reduction [35, 36]; identification of spectral signatures [9]; and feature reduction [33].

To exemplify, [33] compared the performance of SVD to PCA as a feature reduction step on images containing fewer than 100 bands prior to forest/non-forest classifications. In their study, SVD outperformed PCA in terms of classification accuracy and computational time saving. While most SVD-based applications report higher efficacy, in terms of computation and classification results, [9, 37] noted, however, that SVD can be less robust with increasing noise (low signal-to-noise ratio).

Notwithstanding, the use of SVD in remote sensing studies appears somewhat limited. In the past this was attributed to data storage and processing being expensive, especially for very large datasets [34].

Data and concepts

MODIS daily surface reflectance products

The source data for this work are a post- (20072) and a pre-fire (20063) MODIS Daily L2G surface reflectance products (MOD094). Their spatial resolution is 250 m and their extent covers Peloponnese in Greece (geographical extent: south =36, east =23.6, north =38.5, west=21). These products contain quality filtered and atmospherically corrected observations5.

The data consist of infrared observations extracted from MODIS’ band 2 (0:858Am). The samples concern burned and vegetated surfaces. These were delineated on-display based on cloud-free and top-quality MODIS observations by utilising MODIS’ Quality Data6 7 8 9.

Spectral enhancement of burned area clusters via PCA

PCA can enhance the spectral discrimination between burned and unburned land [16, 21, 23, 24, 38]. The initial poor discrimination of burned observations (pixels) among other classes (such as sparse vegetation and water) is enhanced in some of the higher order principal components. We visualise the rationale behind the technique in bi-dimensional PCA scatterplots (Figs. 1, 2). Regardless, we reiterate that PCA does not isolate distinct clusters–it only enables reviewing data from possibly more useful perspectives.

Figure 1 presents a scatterplot of an untransformed bi-variate data set composed by pre-fire (x-axis) and post-fire (y-axis) surface reflectance observations. The distribution of both the pre- and the post-fire observations are visualised in histograms (light grey coloured) on top of each main axis. Samples of vegetation (dark grey asterisks) and burned areas (black dots) are drawn on top of the surface reflectance observations (grey dots). Their distribution is reflected in mini-histograms (black and dark grey respectively) on top of the larger (grey) histogram in each main axis. The sampled classes overlap entirely onto the x-axis. This is expected as no burn scars exist in the prefire acquisition. However, the samples overlap partially on the y-axis. It is this mixture of spectral information in the postfire acquisition that decreases the separability between the samples in question.

The first step of PCA is to mean-center the input data. While the shape of the scatter remains intact after centering, the coordinate system is shifted to its center. EVD is then applied on the covariance matrix of the mean-centered input data. The transformed data are projected on two new axes. The first principal component (axis rotated by 49.6 degrees, labeled PC1) captures the greatest possible spread of the point cloud. This is the maximum variance of the initial bi-variate data set. The 2nd principal component (axis rotated by 139.6 degrees, labeled PC2) receives a lower amount of data, which corresponds to the second possible maximum variance of the input data.

Likewise as with the input, all transformed observations are represented graphically in offset histograms, yet parallel to the new dimensions. In the firs principal component (PC1), the mini-histogram of the burned area samples overlaps partially with the mini-histogram of the vegetation samples. In addition, both mini-histograms are “inside” the main zone of the overall histogram. We observe the similarity of histograms in PC1 with those in the x-axis of the input data. In the second principal component (PC2), however, a higher degree of separation between the sampled classes is obvious. The burned area samples appear as being outside of the main zone of the larger (grey coloured) histogram. In effect, the histograms in PC2 appear similar to those in the y-axis of the input data. Nevertheless, the mini-histograms appear to be slightly more concentrated.

Figure 2 visualises the EVD-based PCA performed on the correlation matrix of the input data. Essentially this means that the input data has been scaled in addition to mean-centering. Comparing to Fig. 1, the centered but unscaled version, a series of important observations reveal in support to the outlined theoretical concepts. First, the shape of the scatter is altered (more widespread). This is numerically confirmed by the modified range of values [−2,2] versus [−0.5,0.5]. Second, the mini-histograms of the sampled classes are flatten.

Four ways to extract principal components via SVD

Within the framework of PCA, we question the effects of centering and scaling in burned area clusters. We build data composites based on the following versions of the original data matrix (overviewed in Table 1): (A) uncentered-unscaled, (B) uncentered-scaled, (C) centered-unscaled, (D) centered-scaled. Next, we subject these to the application of SVD-based PCA, aiming to identify improved spectral enhancements of burned area clusters.

Table 1 Four variants of PCA based on SVD

Using pre- and post-fire MoDIS bands 2, we visualise the application of the four SVD-based PCA versions in Figs. 3, 4, 5 and 6. Their interpretation may follow a path similar as in the previous subsection. Noteworthy is the identicality of Figs. 1 and 2 to Figs. 4 and 6 respectively. This is a mere verification that a centered SVD equals the conventional EVD-based PCA.

Fig. 3
figure 3

Non-centered and non-scaled SVD-based PCA on a bi-variate data set composed by a post- and a pre-fire MODIS surface reflectance band 2. Black dots and grey asterisks are respectively samples of burned surfaces and vegetation. The point-scatter has been rotated after PCA to the direction in which the spread of the data onto the new axes (dashed axes, PC1 and PC2) is maximised. These are restricted to be orthogonal. The 1st component reflects mainly the mean of the original data

Fig. 4
figure 4

Centered yet non-scaled SVD-based PCA on a bi-variate data set composed by a post- and a pre-fire MODIS surface reflectance band 2. Black dots and grey asterisks are respectively samples of burned surfaces and vegetation. The original coordinate system (light greyed) has been shifted after centering. The point-scatter has been rotated after PCA to the direction in which the spread of the data onto the new axes (dashed axes, PC1 and PC2) is maximised. These are restricted to be orthogonal. The 1st component reflects mainly the mean of the original data

Fig. 5
figure 5

Non-centered yet scaled SVD-based PCA on a bi-variate data set composed by a post- and a pre-fire MODIS surface reflectance band 2. Black dots and grey asterisks are respectively samples of burned surfaces and vegetation. The shape of the point-scatter has altered (so did the range of values) after scaling and rotated after PCA to the direction in which the spread of the data onto the new axes (dashed axes, PC1 and PC2) is maximised. These are restricted to be orthogonal. The 1st component reflects mainly the mean of the original data

Fig. 6
figure 6

Centered and scaled SVD-based PCA on a bi-variate data set composed by a post- and a pre-fire MODIS surface reflectance band 2. Black dots and grey asterisks are respectively samples of burned surfaces and vegetation. The original coordinate system (light greyed) has been shifted after centering. The shape of the point-scatter has altered (so did the range of values) after scaling and rotated after PCA to the direction in which the spread of the data onto the new axes (dashed axes, PC1 and PC2) is maximised. These are restricted to be orthogonal. The 1st component reflects mainly the mean of the original data

Most interesting, however, is the non-centered and unscaled SVD case (A) visualised in figure Fig. 3 on page 19. It is unscaled and therefore the input data’s range remains unaltered. Non-centering does not shift the origin of the input scatter plot. Careful observation hints to more concentrated mini-histograms when comparing them to their equivalents in the rest of the versions.

Implications of pre-processing data prior to PCA

Effects of mean-centering

While both PCA and SVD have been widely used in applied scientific research, most studies do not detail why centering is (within the framework of PCA) or should be (within the framework of SVD) performed at all. Few studies in applied disciplines, most of them dating back some decades ago, touch upon this question in-depth ([8, 3944] or even debate it [4548] when compared to numerous applications that employ PCA without detailing its mathematical background. In addition, the existence of a recent statistical study shedding light on the issue [6], indicates that the question itself is infamous.

The fact that centering is treated as an integral part of PCA could be a reason why many studies have not experimented with the differences and effects of applying the general steps of PCA on the raw data matrix. The contrast between a centered and an uncentered PCA lies in the projection (or spread) of the data on to the principal axes. In a centered PCA the spread is proportional to the variance of the data’s coordinates, while in an uncentered PCA it is simply the sum of the squared distances of the points from the unshifted origin (non-central second moments). Cadima and Jolliffe [6] explain that in an uncentred PCA it is variability about the origin, rather than about the centre of gravity of then-point scatter in \(\mathbb {R}^{p}\), that will be of concern.

Non-centering can, and probably will, break the premise of discorrelation between the extracted components. Rather, the first component will reflect the mean instead of the greatest variance [8]. That is, in case the same set of axes are relevant to the explanation of the variability of all clusters (high within-axes heterogeneity), non-centered PCA results in a single generic component [43].

Nonetheless, cases exist where non-centered PCA distinguishes disjunctions more efficiently by intercepting clusters in decreasing order of importance due to size and homogeneity [8, 30, 41, 42]. According to [41], the influence of variance within a cluster on another cluster can be minimised by not subtracting the mean. Whenever the data form distinct clusters (i.e. clusters have zero, or very small, projections on some subset of the original axes – that is high between-axes heterogeneity), the variability associated with each cluster, is mainly explained by certain axes only. The first few uncentered principal axes pass through (or close to) one of the different clusters, thus allowing a clearer and simpler picture of the data structure.

Effects of scaling

The effects of scaling remotely sensed data, prior to PCA, in terms of subsequent classification accuracy, have been investigated by many including [23, 4951]. Most of these studies conclude that scaling is useful in preventing certain features from dominating the analysis due to their large numerical values [52, 53]. Even recent applications report greater efficacy by using data scaled to unit variance [31, 54].

Nevertheless, [55] noted that in the case of using bands with the same physical units, the derivation of the (principal) components from a non-standardised matrix can be justified. The justification relies on the assumption that the information contained within each band is a function of the precision with which the spectral data are sensed. In order to exemplify this, [56] justify the use of the variance-covariance matrix, which weights each band according to its relative variance and therefore produces components which are more easy to interpret. Likewise, [57] confirm that standardization produces less noisy high order components and decreases omission errors in the subsequent fire scar classification, but at the cost of increased commission errors. Due to their preference for a conservative fire scar mapping (fewer commission errors), they deliberately use the unstandardised PCA which they find to perform excellent in extracting fire scars.

Linking non-centered SVD to remote sensing of burned areas

The structure of a multi-spectral data set may be statistically retouched, without destroying existing clusters, in numerous ways. Factually, some operations alter the distances between clusters. These may increase or decrease the spectral distances between various land cover classes. Others preserve the existing distances and modify the way that principal axes intercept existing clusters. The latter is the case for the conventional form of EVD-based PCA. It involvesdata centering and, optionally, scaling. Thus the data are subject to the implications already discussed.

SVD is a numerically more robust alternative solution to PCA. Hence, it is preferrable for high dimensional data sets. Contrary to EVD, centering the raw data matrix in SVD is optional. In case of distinct clusters in high-dimensional data (high between-axes heterogeneity), avoiding centering, minimises the influence of a cluster’s variance on another cluster. In addition, the first principal component captures mainly the mean of the input data. This is advantageous since burned area samples are, expectedly, below the mean. In support, uncentered principal axes, pass close to one of the different clusters. Consequently, existing data clusters, may be detected more clearly by a non-centered SVD.

In this context, it is worth experimenting to avoid scaling the data. Subtle yet important variations between clusters may be preserved in the derived compoments. Such is the case of burned surfaces. They are indexed by lower values due to higher absorption of solar energy in comparison to other surfaces, especially for a few weeks after the pause of the fires. Because there is little deviation among responses corresponding to the same land category, distinct clusters of data are formed. Burned area clusters feature low internal heterogeneity and are, thus, easier to distinguish in some spectral bands. This is even more pronounced in bi-temporal data sets (composed by pre- and post-fire observations) where the between-axes heterogeneity is even higher.

The theoretical concepts presented in this article are demonstrated numerically via a Multiple Response Procedure [58] and supported by separability metrics in a second article titled Remote sensing of burned areas via PCA, Part 2: SVD-based PCA using MODIS and Landsat data [59].

Conclusions

PCA can enhance the spectral separability of burned surfaces among other land cover features. Burned area clusters present low internal heterogeneity and form distinct clusters. In addition, they are easier to detect in some dimensions of a multi-spectral data set. Further they are absent in some dimensions (i.e. pre-fire data) in multi-temporal composite data set. Therefore, they fit the purpose of a non-centered SVD-based PCA.

Nevertheless, due to the natural redundancy of remotely sensed multi-spectral data, and both EVD and SVD being agnostic about data clusters, it may be complex to interpret the effects of centering and scaling the input data prior to PCA. Thorough examination is necessary to identify optimal data pre-processing options.

Endnotes

1 Also referred as conversion to z-scores, z-values, standard scores, normal scores, standardised variables.

2 MOD09GQ.A2007242.h19v05.005.2007244231200.

3 MOD09GQK.A2006239.h19v05.004.2006241155630.

4 These data are distributed by the Land Processes Distributed Active Archive Center (LP DAAC), located at the U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) Center (lpdaac.usgs.gov).

5 https://lpdaac.usgs.gov/dataset_discovery/modis/modis_products_table/mod09gq. (Accessed 27 May 2017)

6 For MOD09 products visit: http://landweb.nascom.nasa.gov/cgi-bin/QA_WWW/newPage.cgi?fileName=mod09_l2&subdir=v2specs. (Accessed 27 May 2017)

7 For Collection4 MODIS products read more at: http://landweb.nascom.nasa.gov/cgi-bin/QA_WWW/detailInfo.cgi?prod_id=MOD09GQK&ver=C4. (Accessed 27 May 2017)

8 For Collection5 MODIS products read more at: http://landweb.nascom.nasa.gov/cgi-bin/QA_WWW/detailInfo.cgi?prod_id=MOD09GQ&ver=C5. (Accessed 27 May 2017)

9 For Known issues in MOD09 products (surface reflectance): https://landweb.nascom.nasa.gov/cgibin/QA_WWW/getSummary.cgi?esdt=MOD09type=C5. (Accessed 27 May 2017)