Evaluating the reliability of time series land cover maps by exploiting the hidden Markov model

Time series land cover maps are important materials for the work related to land use and land cover change. Satellite remote sensing images prove advantageous in fast mapping with low cost. In most time series land cover products yielded by the satellite remote sensing images, a number of illogical transitions exist between different time phases. The time series land cover products cannot exactly reflect the real land cover types and land cover changes for each pixel. The accuracy evaluation based on the limited ground truth cannot well guide the users because the reliability of different pixels of the land cover products is unknown. A generic model for the reliability evaluation of time series land cover products should be developed based on a strong theoretical frame. In order to better guide the use of the land cover products, this paper proposed an approach to evaluate the reliability of time series land cover products by exploiting the joint probability of hidden Markov model (HMM), in which the classification performance and the spatio-temporal relationships were taken into account. We applied the proposed evaluation method on the time series land cover maps of Poyang Lake Eco-economic Region in China. The reliability of the land cover products was presented by the grading of the joint probability of HMM. The results effectively reflected the classification performance, the spatio-temporal relationships and even the quality of the data source.


Introduction
Time series land cover maps are the basis of change studies at the large scale (Peng et al. 2019;Shimabukuro et al. 2019;Xia et al. 2019). The usability of the land cover products closely relates to their reliability (Corves and Place 1994). Satellite remote sensing images are common for large scale LAND COVER mapping (Homer et al. 2004; Bartholome and Belward 2005). The reliability of the land cover maps produced by time series satellite remote sensing images is quite low in certain areas owing to the classification strategies and the discrepancies of image qualities (Cai et al. 2014). Correct evaluation of land cover products will guide the application of the products, thus reducing the contrasts between the effect of practical applications and the expectation of the users. In other words, the application risk will be reduced. In this sense, the quality evaluation of land cover maps is really important.
There exist various global land cover products, like the International Geosphere-Biosphere Program Data and Information System's Land Cover Product (IGBP-DIS-Cover) (Loveland et al. 2000), the Global Land Cover database for year 2000 (GLC2000) (Bartholome and Belward 2005), University of Maryland Global Land Cover Classification (UMD GLCC) (Hansen et al. 2000), Moderate Resolution Imaging Spectroradiometer (MODIS) Collection 4 Land Cover Product (Friedl et al. 2002), MODIS Collection 5 Land Cover Product (Friedl et al. 2010) and GlobCover (Arino et al. 2008). Most products have low consistencies and continuities (Giri et al. 2005;Oort 2005). Furthermore, the accuracies estimated by the limited ground truths (GT) are generally higher than those verified from the third-party researchers (Chen et al. 2015). For example, a research based on over 2000 GT observations in West Siberia reported extreme low accuracies for IGBP-DISCover and MODIS Collection 4 Land Cover Product (Frey and Smith 2007). Another research found the significant overestimation of cropland cover by the global land cover products in Africa (Fritz et al. 2010). So, the existing time series land cover maps cannot reflect the dynamic land cover changes (LCC) for certain areas or classes.
Most studies on evaluating the reliability of land cover maps focused on calculating the uncertainty of remote sensing images by taking into account the spatial correlations (Foody 2005;Comber et al. 2012;Griffith and Chun 2016;Zhang et al. 2019) and modeling the classification uncertainty based on the classifiers (Loew et al. 2015). Besides, a study gave detailed recommendations for sampling and accuracy assessment of land cover maps (Olofsson et al. 2014). However, the accuracy assessment is usually based on the known samples. In fact, people may not have enough known samples for the areas like the edge between different land cover types. Limited samples do not have sufficient representativeness for large areas (Zhen et al. 2013), and the accuracy assessment based on limited samples should be added more criterions (Foody 2009). Moreover, it is rather difficult to acquire enough ground truth in multiple periods. For the applications based on change studies, most land cover products were produced in time series (Friedl et al. 2010;Yang et al. 2017). A generic evaluation model with strong theoretical frame has not been exploited for the time series circumstances, in which the classification accuracy, the spatio-temporal relationships and the image quality are simultaneously considered.
Statistical learning methods generally establish probabilistic models by data and then forecast data by these models in turn. Labelling/forecast and probability calculation problems are the main applications of these methods (Ionescu and Limnios 1999). There have been studies exploiting the labelling problems to map the time series land cover by remote sensing images (Kasetkasem and Varshney 2002;Wolfe et al. 2015;Abercrombie and Friedl 2016;Gong et al. 2017). The methods in these studies were based on the forecast problem of probabilistic graphical model (PGM) (Jordan et al. 1999). In this problem, the most possible label sequence of the time series land cover was determined by the highest conditional probability or joint probability. However, these methods have not been exploited to evaluate the time series land cover classification results. In fact, by exploiting the probability calculation problem of PGM, we may use the model to assess the reliability of land cover maps. Although a few studies quantified the uncertainty in land cover maps by the probabilistic methods (Li and Zhang 2011;Cripps et al. 2013), a known sample set were still needed to support the frameworks.
Inspired by the previous studies, this paper introduced an approach to evaluate the reliability of land cover maps by taking into account the spatio-temporal context. In this approach, the rules of land cover transitions and the classification probabilities were put in the hidden Markov model (HMM) (Miller et al. 1999). A spatial reliability indicator was designed based on the idea of local binary pattern (LBP) (Ojala et al. 2002). The spatio-temporal relationships and the classification performance were simultaneously employed to calculate the joint probability of HMM under the premise that the traditional 'hidden' state layer (land cover class sequence) was already known. We further exploited the approach to evaluate the land cover maps for circumstances of both time series and single moment. We applied the model on the evaluation of the time series land cover maps of Poyang Lake Eco-economic Region and further discussed the results.

Area and data processing
The study area, Poyang Lake Eco-economic Region, is located in the middle and lower reaches of Changjiang River, Middle China. Most of the study area belongs to Jiangxi Province (Fig. 1). The longitude range is between 114°and 117°, and the latitude range is between 27°and 40°. The total area is 51,100 km 2 . The climate consists of temperate ecosystem and subtropical ecosystem. A large part of the territory is covered by croplands, forests and water body, among which the core lake area is quite stable and other land cover areas relatively change frequently.
The study took advantage of multi-source images to produce the time series land cover maps. The initial satellite images comprised Landsat 5 Thematic Mapper (TM) images (Chander et al. 2009) from year 2007 to 2011, Huanjing Charge Coupled Device (CCD) images (Hu and Tang 2012) for year 2012 and Landsat 8 Operational Land Imager (OLI) images (Roy et al. 2014) from year 2013 to 2015. The Landsat path and row of the study area were respectively 120, 121, 122, 123 and 39, 40, 41. The images, all covering the study area, were georeferenced and the resolution was 30 m. The samples were collected by the field survey and human interpretation from Google Earth. We selected 7308 pixels and recorded their ground information in terms of class label, longitude and latitude for each year. The sample numbers of water, cropland, forests, artificial cover and bare land were 2972, 951, 1939, 1085 and 361, respectively. We used 5-folder cross validation for the accuracy calculation (Kohavi 1995). That means the samples were split into 5 mutually exclusive subsets. Each subset was employed as the test set in turn and the others were employed as the training sets. The accuracy was then calculated by the average values.
To make full advantage of the remote sensing images with 30 m resolution, the pixel-wised compositing approach is commonly used in land cover mapping of large areas (Roy et al. 2010). To overcome the influence of the clouds, the study employed the minimum-value compositing to yield the composited images covering the whole study area. For the pixels covered by different images, the minimum values of the spectral bands were selected. After the images of all the years were yielded, the support vector machine (SVM) classifier (Vapnik and Cortes 1995;Ben-Hur et al. 2000) was employed to produce the land cover maps year by year. The study used the Environment for Visualizing Images (ENVI) software to deal with the classification procedure. Simultaneously, the posterior probability of each pixel was acquired and saved by ENVI.

Generation of hidden Markov model
We will describe the generation of hidden Markov model (HMM) corresponding to time series remote sensing land cover labelling. Suppose C ¼ c 1 ; c 2 ; . . .c N f gis the state set that containing all the possible land cover classes grepresents the observation set that containing all the possible spectral values, in which each element belonging to O is a spectral vector. N is the number of the classes and M is the total number of all the possible spectral vectors. L ¼ l 1 ; l 2 ; Á Á Á l T f g is the label sequence of any pixel for all the T moments, where each element in L belongs to C. S ¼ s 1 ; s 2 ; Á Á Á s T f gis the spectral vector sequence of any pixel for all the T moments, where each element in S belongs to O. A is the transition probability matrix and B is the observation probability matrix where represents the conditional probability that a pixel with Class c i will transfer to Class c j from Time t to t ? 1. And represents the probability that a spectral vector o k will be generated by a class c j or a class c j will be observed by a spectral vector o k at any moment. In addition, an initial state probability vector should be decided as where Consequently, HMM consists of three necessary elements: transition probability matrix, observation probability matrix and initial state probability vector. They are usually expressed by k ¼ A; B; p ð Þ.

Solutions for the reliability evaluation of land cover products
Traditionally, the probability calculation problem of HMM acquires the probability of the occurrence of the observation sequence S when the model parameters k ¼ A; B; p ð Þ and the observation sequence S are fixed: In this case, the hidden label sequence is unknown, and the solution of the problem is generally the Viterbi algorithm (Li et al. 1999;Gong et al. 2017), which is derived from the dynamic programming (DP) algorithm in the operational Research (Howard 1966). However, unlike the traditional probability calculation problem of HMM, the class label sequence (state sequence) is available in the land cover maps and the traditional joint probability of HMM does not meet the purpose of the evaluation.
Another case is to calculate the probability of the occurrence of the state sequence (class label sequence) L (Eq. 8) when we consider the land cover product evaluation problem: where p 1 is the first element of the initial state probability vector. In this case, we cannot accurately evaluate the reliability just by the state sequence and the predetermined transition probability unless the initial state vector and transition probability matrix of the model are so accurate that we do not have to consider the probability of classification and the spatial rationality of the land cover product. In fact, we cannot estimate the objective transition rules of land cover types by a limited number of samples. This study transforms the traditional situation into the calculation of the joint probability of the concurrent appearance of the observation sequence S and the state sequence L. That means the 'hidden' state sequence is known in this problem, and more importantly, the classification performance and the transition matrix will be simultaneously taken into account. As a result, the solution is the multiplication of the transition probability and observation probability belonging to each moment: We have to mention that HMM cannot work without two primary premises. In simple words, the state in any moment only depends on the state of the previous moment, and is independent of the other moments; the observation at any moment only depends on the state at the moment, and is independent of the other states (Rabiner 1989).

Determination of the essential elements
First, in this study, the transition probability matrix represents the transition probability between different land cover types within 1 year. Different land cover types comply with different transformation rules: some areas are relatively stable, like forest cover areas; while others change quite fast, like urban expansion areas (Gómez et al. 2016). We believe it should be close to the physical truths of the study area as much as possible. The physical truths include the location, development planning and the ecosystem etc.
The study area has a relatively higher level of land cover change rate in china (Li et al. 2017). According to the logical transitions and the statistical studies about the transition rate of the land cover in the study area (Hui et al. 2008;Feng et al. 2012;Michishita et al. 2012;Zhang et al. 2014;Li et al. 2017), the rates and types of LCC present differences between different classes, and the conclusions of these studies are not consistent. To give a simplified expression and avoid disputes, we refer to the study in Abercrombie and Friedl (2016) and give a 10% probability for annual land cover change. That means, in this model, the transition probability will be set to 90% when a pixel's class labels of the adjacent two years are the same.
Second, each element in the initial state probability vector is set to be the same in this study because it is multiplied only once, instead of many times according to Eq. 9. Consequently, the magnitude of the final calculation result will not change significantly due to an initial value. So, the initial state probability vector is ignored when we evaluate the relative reliability.
Third, the observation probability values are different for different pixel locations. In order to acquire the values, we first calculate the posterior probabilities which represent the probabilities of the correct classification for each pixel. The probabilities can be acquired from various classifiers (Friedl et al. 2002). In this study, we get the posterior probabilities from ENVI classification output. Then the observation probability values are calculated by Bayes formula (Zadeh 1984).
In addition, the spatial continuity also reflects the reliability of the land cover products. Inspired by the ideas of local binary pattern (LBP) (Guo et al. 2010), we had introduced a spatial reliability indicator taking into account the spatial continuity (Gong et al. 2017). Here we give a more detailed description along with necessary legends. Suppose A is the label of a certain pixel P at any moment. A1, A2, A3, A4, A5, A6, A7 and A8 are the class labels of the 8 neighboring pixels of P. Let n be the number of the neighboring pixels of the same labels with P. We initially define the patterns '1' and '0' for the neighboring pixels. '1' stands for the class label consistent with P and '0' stands for the class label inconsistent with P. Let v be the variation times of the patterns from any directions (clockwise or anticlockwise). The results with the highest n and the lowest v are the most reliable. Then the indicator I is preliminarily defined by The possible patterns and all the values of the indicator are listed by Fig. 2. We give a sample pattern for each possible value of n and v. We set I to 0 when n ¼ 0.
Besides, when n ¼ 8 and v ¼ 0, I is set to 4 which is higher than all the other values. The value of I should be normalized to fit the model and all the values should be divided by 4. Consequently, the range of I is between 0 and 1. This range is suitable for the probabilistic expression. As a result, the indicator is calculated by The higher value of the indicator means the more reliable spatial relationship. In contrast, the low values indicate the land cover patterns are unreliable. Figure 3 illustrates several common patterns corresponding to the possible land cover distributions. In the central lake area, like the first example, the labels of both the central pixel and the 8 neighboring pixels are water. According to Eq. 11, the indicator value is the maximum of 1. For pixels at the boundary between water and farmland (the second example), the spatial reliability is also quite high. In this circumstance, the value of n and v are 5 and 2, respectively. And the indicator value is 5/8. In case of more classes surrounding (like the third example), a relatively lower indicator value is more likely to be yielded for the central pixel. According to the equation, the value of n and v are 3 and 2, respectively. And the indicator value is 3/8. However, since the third example also belongs to a reasonable and objective land cover distribution pattern, the indicator value of the spatial reliability is still higher than many other cases, especially the unreasonable land cover distribution which may exist in any land cover map.
To consider the spatial reliability in the model, the observation probability b ls should be multiplied by I at each moment without destroying the value range of the probability in Eq. 9. The joint probability is then calculated by where I t means the spatial indicator of the tth moment.

Time series evaluation
The evaluation of the time series land cover products is calculated by the joint probability introduced above (Eq. 12). It reflects the overall reliability of the time series land cover products expressed by the probability values.
The elements in the model are all normalized values. That means the probability values of each moment are between 0 and 1. Consequently, the classification performance, temporal relationships and spatial continuities are simultaneously considered in HMM. However, the values of joint probability are too exhaustive. Small gaps in the values cannot tell the differences of the reliability. This study distinguishes the levels of the reliability of the land cover products by the orders of magnitudes. Values within one order of magnitude are deemed to own the same reliability level. The grading results will be landed in Sect. 3.

Single moment evaluation
The evaluation of the land cover map of a certain moment is calculated by the same model in which only one moment is involved. As a result, Eq. 12 is transformed into P L t ; S t ð Þ¼P S t jL t ð ÞP L t ð Þ ¼ p 1 a l t l tþ1 b l t s t I t i ¼ 1; 2; . . .; N; t ¼ 1; 2; . . .; T À 1: However, this form only takes into account the temporal relationship between the target moment and next moment. Obviously, the former moment also has direct relationship with the target moment. In this study, the transition probability from the former moment to the target moment is taken as the weight. Then, the evaluation of the single moment is calculated by P L t ; S t ð Þ¼P S t jL t ð ÞP L t ð Þ ¼ p 1 a l tÀ1 l t a l t l tþ1 b l t s t I t i ¼ 1; 2; . . .; N; t ¼ 2; . . .; T À 1: Specially, for the first moment, a l tÀ1 l t is replaced by a l t l tþ1 ; while for the last moment, a l t l tþ1 is replaced by a l tÀ1 l t . In this way, the evaluation of the first and the last moment is treated equally with other moments.

Reliability evaluation without probabilities
Sometimes, the probabilities are not easy to be acquired for many classifiers. Besides, the probabilities of all the pixels are not always retained in practice. In this case, the reliability can also be estimated by HMM, in which the observation probabilities are set to 1. That means the classifier is recognized to give the most credible output. The joint probability is then calculated by where the spatial indicator I t takes place of the observation probability. We believe the classification performance can be indirectly reflected by the spatio-temporal relationships if the probabilities and the accuracies are unavailable. The evaluation results without probabilities will be shown in the next section. Figure 4 shows the time series land cover maps of the study area. Even the pixel-wised compositing was employed, the strips in the results of certain years cannot be fully removed owing to the quality of the images. However, since this study aims to assess the quality of the products, the classification results are not important. The average accuracies calculated by the ground truth is 90.5%.

Evaluation results of the reliability
The evaluation results of the time series products are presented in Fig. 5. Figure 5a presents the pixel-level distribution of the reliability for two cases. In one case, the (posteriori) probability is available and the results are calculated by Eq. 12; in another case, the probability data is not acquired or retained in practice and the results are calculated by Eq. 15. Figure 5b counts the distribution of both cases. According to the figure, the areas of higher temporal stability, higher spatial continuity and higher classification accuracies are more likely to acquire higher levels. For example, the water body in the center of the lake is not possible to change within several years, and these pixels have higher spatial continuities. Moreover, the separability of water is obviously the highest among all the classes. As a result, these pixels are given the higher levels. In contrast, other areas are not likely to acquire such a high level: the riverbed for the frequently temporal variations, the farmland for the variations and mixed classification with the forests and water, the mountain forests for the uncertain spatial relationship and the classification performance, the urban marginal areas for the speed of the expansion.
The results of two cases (with probability input and without probability input) are close in both visual perspective and statistics. That means the classifier has reported a quite high posteriori probability for most pixels in the land cover maps. And the temporal relationships play an important role in the proposed evaluation scheme of time series land cover maps. Even without regard to the classification performance, the spatio-temporal relationships will reflect the quality of time series land cover maps. maps. For example, the reliability of the central lake area is high, while areas with complex spatial relationships have lower reliability. Tables 1 and 2 present the evaluation levels (expressed by the brightness of the gray level) corresponding to the orders of the magnitude of the joint probability (for example, if a pixel gets the reliability level like 3, that means the joint probability calculated by Eq. 12 is between 10 -7 and 10 -6 ). Note that the elements in the transition probability matrix are multiplied by 100 in the calculation procedure in this study to avoid extremely low values. Additionally, we did not list all the levels, especially for those with the extremely low orders of the magnitude. For time series evaluation, all pixels with the joint probability of lower than 10 -9 will be given the same level of 0. For single year evaluation, all pixels with the joint probability of lower than 10 -7 will be given the same level of 0.

Discussion
The experimental results lead to the following discussions. Specially, the advantages, theoretical comparisons and limitations of the study will be mentioned.

Advantages from users' perspective
The proposed method will reflect the influence of the stripes and the shadows owing to the spatial indicator. The undesirable results of the stripes and the shadows can hardly be expressed by the accuracy. In addition, the marginal areas and the areas of frequent variations will be labelled as a lower level of reliability, for the joint probability is more likely to be low for frequently changing pixels. These advantages will benefit the users in practice. Figure 7 shows the typical regions from the classification results and the evaluation output. The frequently-changing areas like the riverbed and the paddy fields are given the lower levels (Fig. 7a). Also, the misclassification owing to the stripes can be discovered in the evaluation maps (Fig. 7b).
Based on the strategy proposed by this paper, people can use the land cover products selectively as needed in practice. For example, users can only choose the areas with no less than a certain level, or give a weight according to the estimated reliability when they use the products. This is important in the management and policy decision (Fritz   Orders of magnitude

Comparisons in theory
HMM and Maximum A Posteriori-Markov Random Field (MAP-MRF) have both been employed to time series land cover mapping in previous studies (Cai et al. 2014;Abercrombie and Friedl 2016). However, the models have to modify a large number of pixels in the post-classification process. To some extent, if the parameters are not accurate, the models will make the classification results more reasonable, rather than objective and accurate. Moreover, the parameters of the spatio-temporal weights are difficult to be decided. Thereby, we believe the models are more suitable in evaluation than in mapping. Specially, compared to undirected graph models like MRF (Li 2001), the directed graph models [Bayesian Network (Pearl 1988)] like HMM are more suitable, for it can give the joint probabilities of all the moments with much less time cost. According to Gibbs distributions, the joint probability of MRF can be expressed by the energy function (Hazan et al. 2013). However, it can just act on the maximum clique, instead of the whole chains. The previous studies generally considered the spatial relationships in reliability evaluation (Comber et al. 2012;Zhang et al. 2019), but it is rare to consider both temporal and spatial contexts. This study integrated the temporal and spatial relationships into a unified model for the reliability evaluation of time series land cover maps. On one hand, by considering the characteristics of remotely-sensed land cover maps, the study exploited the joint probability calculation problems of HMM and applied a generic model to the problem. Because the probability calculation problem of the model was solved by continuous multiplication, instead of the Viterbi algorithm (Li et al. 1999), it was easy to reproduce. On the other hand, the proposed strategy took into account the spatial and temporal relationships simultaneously without violating the premise of the basic elements and requirements of HMM.

Limitations
In this study, we refer to the previous study to decide the transition probability matrix. A more accurate setting may be different according to the ecological law and the development characteristics of the study area. However, we believe the transition probability matrix reflects the trends of the land cover transitions. Similarly, the spatial indicator also reflects the rationality of the spatial relationships of the land cover types. The single moment evaluation only takes into account the neighboring years, and the influence of the other years is ignored. In this sense, the undirected graph model may also be the applicable model because the time cost of the calculation of the joint probability will be relatively low (Cai et al. 2014).
The evaluation scheme only gives the relative reliability levels calculated by the joint probabilities of the regional land cover maps, instead of the absolute reliability. In other study areas, the order of magnitudes may vary with multiple factors, like the number of moments, the overall classification performance and the regional ecological laws. A criterion should be unified in the future. A possible solution is to select the water body in the center of the lake/ river/ocean as the highest level, because these pixels have a higher identification accuracy in most land cover products. Although this paper provided the evaluation results under different model parameters and application modes, the verification of the evaluation effect is still limited. However, different from the verification of classification accuracy, it is not sufficient to test the whole time series land cover products by limited samples.

Conclusion
The paper contributed to the previous studies on time series land cover mapping, post-classification and statistical learning methods. We introduced a strategy to evaluate the time series land cover products. The spatio-temporal relationships and the rules of local environment were simultaneously taken into account. The evaluation framework did not rely on the ground truth information. Based on the joint probability of HMM, we exploited the models for the time series evaluation, single moment evaluation and the reliability evaluation without probabilities. A nine-year time series land cover maps were employed to apply the proposed method. The evaluation results of the graded joint probability reflected the reliability pixel by pixel. The results demonstrated the importance of the spatio-temporal relationships in the evaluation of time series land cover classification. Meanwhile, the results present the influences of the stripes, shadows and marginal areas in the land cover products. The users of the land cover products will be better guided by the proposed method. Though the paper described a framework to evaluate the time series land cover products by the probability values, some details should be hashed out. For example, the standard of the level may be stipulated according to the length of time series.
Nowadays, different versions of global time series land cover products are in service, and the reliability of these products presents differences in different areas. By using the proposed method, we can equally and objectively evaluate the reliability of these products for any area. It will be contributory to design a unified scheme to evaluate the reliability of the existing global land cover products.
Authors contribution GY is the main author who proposed the basic idea and completed the experiments. SF and WG provided the useful suggestions on designing the approaches involved in our proposed strategy. YZ and MG helped to complete the manuscript. Availability of data and materials The experiments of the study were based on the Landsat images. The data processing was done on https://earthengine.google.com. The images are available on https:// earthexplorer.usgs.gov and https://earthengine.google.com.
Code availability The custom code is not available online now.

Compliance with ethical standards
Conflict of interest The authors declared no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.