Introduction

Several wetlands definitions exist and vary with regions. These include amongst others the commonly used Ramsar convention definition which is globally recognized. This convention defines wetlands as “Areas of areas of marsh, fen, peatland or water, whether natural or artificial, permanent or temporary, with water that is static or flowing, fresh, brackish or salt, including areas of marine water the depth of which at low tide does not exceed six meters”. Although this definition is globally recognized, other natural and man-made features such as flooded gravel peats, reservoirs, rice paddies, and coastal beaches are also considered as wetlands (Hollis 1990). This study, adopted the South African National Water Act (36 of 1998), which defines wetlands as land that is transitional between terrestrial and aquatic systems where the water table is usually at or near the surface, or the land is periodically covered with shallow water, and which.

land in normal circumstances supports or would support vegetation typically adapted to life in saturated soils (Day et al. 2010). This definition was used because it encompasses all the factors defining a wetland listed in the definitions of the countries sharing the LTRB.

Wetlands, provide valuable services such as water regulation, biodiversity conservation, carbon storage, and flood mitigation (Day et al. 2010). These ecosystems support a wide array of plant and animal species, including rare and endangered ones, and often have high biodiversity (Carolissen 2022; Van Deventer et al. 2019). Additionally, they play a vital role in the hydrological cycle by acting as natural water filters and regulate flows through storage and release of water. Changes in land use and land cover significantly impact water quality and quantity within wetlands, leading to alterations in their hydrological processes (Cheng et al. 2022; Siddik et al. 2022). Given the significance of these systems, monitoring their associated Land Use and Land Cover (LULC) changes is of utmost importance for maintaining their environmental and ecohydrological integrity. Through the monitoring of LULC changes in the adjacent areas to their localities, we gain insights into the influences of human activities and climate variability on wetland habitats and species diversity. Such monitoring helps in comprehending the socioeconomic consequences of these changes on local livelihoods, facilitating the development of sustainable land use practices that preserve the ecosystem services provided by wetlands (Marambanyika et al. 2017). By identifying areas undergoing significant LULC changes, conservation efforts can be directed towards mitigating the adverse effects on wetland biodiversity. Overall, monitoring LULC changes impacting on wetlands is instrumental in formulating strategies for sustainable water management (Musasa and Marambanyika 2020).

LULC changes primarily stem from anthropogenic activities such as urbanization, artisanal mining, and expansion of agricultural lands. These activities have far-reaching environmental implications, such as ecological alterations, biodiversity depletion, and contamination of water and air (Chen et al. 2014; Mokaya et al. 2004). Because of these LULC changes, wetlands experience declines in productivity water quality, reduction in extent, and vegetation health. At the global level over 60% of wetlands have been degraded as a direct result of human-induced pressures associated with LULC changes (Millennium Ecosystem Assessment, 2005). This degradation resulted in the loss of critical ecosystems goods and services provided by wetlands (Alam et al. 2011; Martínez-López et al. 2014).

Previous studies have mainly examined the impacts of adjacent LULC changes on wetland conditions and loss, water quality and vegetation dynamics, at sub-basin level (Singh and Lin 2015; Mohammadimanesh et al. 2018; Chen et al. 2020; Thamaga et al. 2022; Dzurume 2021). These studies, have primarily highlighted impacts on protected systems, and not on other numerous occurring in river basins (e.g. vleis and dambos) (Chen and Liu 2015; Thamaga et al. 2022). Consequently, these systems are being displaced by other land cover types, resulting in the loss of essential ecosystem goods and services they provide to their surrounding communities (Musasa and Marambanyika 2020). To address this, it is crucial to understand the spatial dynamics of LULC changes and their relationship to changes in wetland extent. This will provide information which will inform strategies aimed at mitigating the impacts of these LULC changes on various ecosystems, including wetlands, and preserving the critical ecosystem services (Dzurume et al. 2021; Thamaga 2021).

Large-scale monitoring of the impacts of LULC changes on wetland extent pose challenges due to cost, particularly when relying on field-based methodologies. Moreover, accessing all the wetlands at transboundary basin level is highly problematic because of the need to obtain permits from several jurisdictions. The availability of Earth Observations (EO) data and remote sensing approaches provide unique opportunities to address these challenges and model LULC change dynamics over time. For example, Close et al. (2021), evaluated the potential of Sentinel-2 EO data in analyzing changes in LULC of the Wallonia area in Belgium from the year 2016 to 2018. The results showed significant changes between land cover observed in the area at acceptable accuracies, thus underscoring the relevance Sentinel-2 data in LULC change analysis for the region. Another study by Singh et al. (2022) analyzed LULC changes at a basin level using EO data (Landsat) coupled with markov chain model in India. The results also showed significant LULC changes with acceptable overall accuracies, thereby proving the relevance of Landsat data in understanding the LULC changes pattern for the region. Although EO data offer unique opportunities to monitor LULC changes dynamics overtime and larger spatial extent, the lack of lack of seamless and cloud-free data during certain times of the year presents difficulties in continuous monitoring of the dynamic LULC changes overtime as cloud cover introduces errors to the classification outputs. A study by Gxokwe et al. (2022), highlighted the problem of obtaining all year round images for assessing LULC for a wetland over a 20 year period, their study had to rely on only dry season images due to lack of cloud free images for wet season in their study area. These dry season images used did not completely enhance some wetland features, thus introducing misclassifications in their model outputs. Another study by Uchegbulam et al. (2021) assessed the influence of cloud cover proportion on the classification outputs of LULC changes dynamics in the Western Niger Delta for the years 2002, 2014 and 2019. The utilized Landsat 7 images with cloud cover proportions of 0.48% for the 2002 image, 0% for the 2014 and 38% for the 2019 image. The results indicated an exaggerated increase in the fractional coverage of some land cover classes such Built up areas for the year 2019, which were attributed to errors introduced by cloud cover proportions in the image used. Therefore, proving that lack of cloud free images for some time of the year does present difficulties in continuous monitoring of LULC change overtime. Fortunately, advancements in remote sensing data analytic tools, such as the introduction of cloud computing platforms like Google Earth Engine (GEE), have emerged as a solution. These platforms offer specialized algorithms for filtering, integrating, and processing multi-source and multi-temporal data (Adeli et al. 2020; Gorelick et al. 2017; Mahdianpari et al. 2020; Wang and Yésou 2018), thereby enhancing monitoring of the environmental changes from EO data.

Significant progress has been made in analyzing the dynamics of LULC changes and their relations to changes in wetland extent, thanks to the availability of cloud computing platforms like Google Earth Engine. Studies such as Mahdianpari et al. (20192020); Shelestov et al. (2017a, 2017b); Shafizadeh-Moghadam et al. (2021); Ji et al. (2020) have successfully utilized the GEE platform to analyze LULC changes and their relationships with wetlands in different regions of the world, achieving reasonable accuracies (> 70%). While these studies have highlighted the relevance of the GEE platform in monitoring the relations between various LULC types and wetlands, their focus was at sub-basin. Therefore, it is crucial to expand the monitoring of LULC change dynamics and their relations to wetlands at a larger spatial scale. This broader approach will provide valuable information on the spatiotemporal extent of wetlands and large spatial scales, thus for informing LULC change management strategies about wetland areas requiring prioritization during their management process (Dube et al. 2023).

This study sought to analyze the long-term and large-scale dynamics of wetland changes in the Limpopo transboundary river basin over a 20-year period (2000–2020) using cloud-based earth observation data. In this transboundary basin, there is increasing concern about the deteriorating state of ecosystems, including wetlands due to LULC changes (Maviza and Ahmed 2020; Mugari and Masundire 2022). Moreover, there are limited studies utilizing the GEE platform to examine the basin scale dynamics of LULC changes. Therefore, the objectives of this study are twofold; (1) to determine spatiotemporal trends of LULC changes by employing advanced machine learning algorithms and analyzing time series data from multiple remotely sensed sources within the GEE platform, and (2) to assess the relationships between LULC transitions and changes in wetland extent. By achieving these objectives, this study aims to fill the knowledge gap, provide valuable insights into the of LULC changes dynamics and their relations to changes in wetland ecosystems, and contribute to informed decision-making and the development of effective management strategies to mitigate the negative effects of LULC changes in the Limpopo transboundary river basin.

Description of the study area

The study was conducted in the Limpopo transboundary river basin (Fig. 1), which is shared by Botswana, Mozambique, Zimbabwe, and South Africa. The basin has an area of 412,938 km2 and to northern part of South Africa, eastern Botswana, southern Zimbabwe, and part of southern Mozambique (Chapman 2017; Dzurume et al. 2021). The LTRB experiences a semi-arid climate with annual precipitation ranging from 200 mm/year to 1500 mm/year, averaging around 500 mm/year. The majority (95%) of this precipitation occurs between October and April. Evaporation in the LTRB varies from 800 mm/year to 2000 mm/year, with an average of 1970 mm/year. The hydrology of the LTRB is characterized by several major rivers, including the Crocodile and Marico Rivers in South Africa, the Notwane River in Botswana, and the Shashe River between Zimbabwe and Botswana. These rivers are tributaries of the Limpopo River, which flows through the four countries (Kapangaziwiri et al. 2021; Zengeya et al. 2011). Additionally, there are several inter-basin transfer schemes that contribute water to the LTRB (Chapman 2017; Mosase et al. 2019). The land cover in the basin is predominantly mixed, comprising forested lands, built-up areas, tree cover and mixed grasslands, and croplands (Maviza and Ahmed 2020).

Fig. 1
figure 1

Location of the Limpopo transboundary basin in Southern Africa (Red box indicates plot area where field-based measurements were undertaken)

Materials and methods

Ground truth data collection

Ground truth data for this study was collected through field-based surveys and by reviewing records of different land cover products from sources such as the South African Biodiversity Institute (SANBI) and the South African Water Research Commission. In situ data land cover and land use data were collected during the complementary study between August 28, 2020, and October 30, 2020, with funding restricted to a South African side (area marked by the box in Fig. 1). The in-situ data collection period coincided with the dates of some of the images used in this study. A handheld geographical positioning system (GPS) with an error margin of ± 3.65 m was used to collect geographical coordinates of various landcover types found in the region. The dataset consisted of 428 ground truth points representing nine land cover classes, as presented in Table 1. The field data collection of the land cover points was guided by the explanation of classes explained in addendum Table 4. For wetlands, indicators such as vegetation types associated with wetlands in Southern Africa (Phragmites australis, Cyprus papyrus, Oryza longistaminata amongst others) as well as soil characteristics associated with prolonged moisture (redoximorphic features at 50 cm depth of the suspected wetland area). The stratified random sampling approach was employed to collect the data, whereby the site for field measurements was divided into plot sizes of 900 m2. A minimum of 50 points representing different land cover classes (Table 1) were collected in each plot, based on the guidance provided by related studies such as Mtengwana et al. (2020); Thamaga and Dube (2018); Mudereri et al. (2021). The 900 m2 plots were spaced 1 km apart to prevent overlap in the collected samples. Furthermore, the field-collected ground truth points were supplemented by points extracted from the 30 m resolution land cover products provided by SANBI, and reports of South African Water Research Commission. The secondary data extraction from these sources also aligned with the dates of some of the images used in this study. In total, 1056 points representing the same land cover classes as the field-collected data were extracted. The combined dataset of 1484 points obtained from the field surveys and record reviews was randomly split into 70% for training and 30% for testing the pixel-based Random Forest (RF) model within the GEE platform.

Table 1 Number of training and test points for LULC analysis

Remote sensing data acquisition and processing

Pre-processing of Landsat data

Figure 2 illustrates the processing steps employed to analyze the data. The data from various satellite sensors in GEE catalog were obtained during this process. The data included Landsat 8 OLI data covering Period 4 (2016–2020), Sentinel-1 data corresponding to the same period as Landsat 8, and Landsat 5 TM data representing Periods 1 (2000–2005), 2 (2006–2010), and 3 (2011–2015). Landsat products were chosen due to their long and continuous availability on the GEE platform, enabling continuous monitoring Sentinel-1 data was specifically incorporated to enhance some LULC classes, such as forested wetlands, which may not be adequately distinguished using Landsat data alone. The number of images obtained per time period are presented in Table 5 in the Appendix, and multi-year composites were used per each time period. Prior to classification, the remotely sensed Landsat data underwent preprocessing. Initially, the images were clipped to the extent of the transboundary basin boundary. Subsequently, image stacks for each time period were masked to eliminate clouds and shadows. This was achieved by utilizing the quality assurance bands (QA bands) present in the products, which identify pixels affected by instrument artifacts or cloud contamination. In this study, cloud and shadow removal involved computing the QA bits to generate a single-band image containing cloud and shadow scores. Based on this image, cloud and shadow masks were created and applied to the respective image stacks for each time period, effectively eliminating cloudy and shadowed pixels. After the cloud masking process, the images were mosaicked to produce a composite image for each time period. Following mosaicking, the Landsat 5 and 8 bands were resampled to a 10-meter spatial resolution to facilitate integration with Sentinel-1 data, which varied in spatial resolution from 10 to 25 m. This step aimed to enhance LULC features. The nearest neighbor resampling method was employed to resample the Landsat bands, and the final resolution used for the resampled bands was 10 m. The selection of this method was informed by several related studies such as Thamaga et al. (2021); Mtengwana et al. (2020). Subsequently, the resampled images were used to compute the following indices: Normalized Difference Vegetation Index (NDVI), Normalized Difference Phenology Index (NDPI), modified Normalized Difference Water Index (mNDWI), and Normalized Difference Built-up Index (NDBI), (Table 2). NDVI was selected for its sensitivity to photosynthetically active biomass, facilitating discrimination between wetland and non-wetland areas, as well as vegetated and non-vegetated regions (Liu and Huete 1995). NDPI was chosen due to its ability to provide information on vegetation water content by employing a weighted combination of the Red and Short-Wave Infrared (SWIR) bands, thus enabling differentiation between healthy vegetation, non-healthy vegetation, and bare soils (Wang et al. 2017). The mNDWI was preferred over the standard Normalized Difference Water Index (NDWI) as it more accurately extracted open water features and assessed moisture content even in the absence of surface inundation, as reported in the study by Maswanganye et al. (2022). The inclusion of mNDWI in this study aimed to detect wetlands which at the time of image acquisition were not inundated. NDBI was selected for its effectiveness in mapping urban areas (Karanam and BabuNeela 2018) amongst others. All the computed indices were then added as additional bands to their respective mosaicked images, which were later utilized in the pixel-based classification process.

Fig. 2
figure 2

Sequential steps followed on the GEE platform to process the remotely sensed data utilized in this study

Table 2 Spectral Indices extracted from the SAR and optical remotely sensed data

Pre- processing of synthetic aperture radar data

The study utilized preprocessed Sentinel-1 GRD data. The preprocessing steps align with the guidelines provided in the GEE user’s guide and are like those outlined in the ESA’s SNAP Sentinel-1 toolbox. These steps encompass procedures including updating orbit metadata, removing GRD border noise, eliminating thermal noise, performing radiometric calibration, conducting terrain correction, converting backscattering intensity to normalized backscattering coefficients, correcting for incidence angle, and reducing speckle. In this study, the preprocessed Sentinel-1 data was initially filtered based on metadata properties, such as transmitter-receiver polarization, instrument mode, and orbit properties, was applied. The specific transmitter-receiver polarization utilized in this study focused on vertically transmitted (σ0VV) and vertically received SAR backscattering coefficients (σ0VH). vertically transmitted horizontally received SAR backscattering coefficient σ0VV was used because of its sensitivity to soil moisture and its ability to discriminate between flooded and non-flooded vegetation thus enhancing the detection of wetland areas (Adeli et al. 2020). σ0VH was used because it is cross-polarization and these are known to be produced by volume scattering within the vegetation canopy and have a higher sensitivity to vegetation structures (Adeli et al. 2020). The instrument mode chosen was a wide swath (IW) for both ascending and descending orbits. After filtering by metadata properties, the σ0VH and σ0VV were segregated, using the orbit properties (ascending and descending) and mean values for each polarization and orbit property were computed. The computed mean σ0VH and σ0VV values were then merged for each orbital property and used to compute the SAR ratio, dual polarization Radar Vegetation Index (dual-pol RV) as well as Dual Polarimetric Synthetic Aperture Radar Vegetation Index (DPSVI) (Table 2). SAR ratio was computed as this has proven to be useful in mapping pasture lands (Nicolau et al. 2021). The dual-pol RV index (Ratio Vegetation index) was selected in this study to distinguish the specific land cover type from other types. This choice is supported by the fact that the Sentinel-1 data available in GEE catalogue provides dual polarization (VV, VH) information. Li and Wang (2018); Trudel et al. (2012) have demonstrated the successful retrieval of soil moisture information using this index. Another index employed is DPSVI (Dual Polarization Surface Vegetation Index), which calculates the rate of depolarization based on the vertical dual depolarization index and aids in discriminating between bare surfaces and vegetated areas (Mandal et al. 2020). To evaluate the potential of integrating SAR and optical data for improving classification accuracy, the merged mean values for σ0VH and σ0VV, along with the SAR indices, were added to the resampled Landsat 8 OLI data.

Pixel based image analyses

Pixel-based image analyses were implemented on the mosaicked images. Some studies (Mahdianpari et al. 2020; Amani et al. 2017; Dlamini et al. 2021a) have highlighted the potential spectral mixing and resulting inaccuracies associated with pixel-based classification, however, This was chosen due to computational limitations associated with object-based analysis in large-scale mapping. Object-based analysis is computationally expensive, and the size of the catchment being modeled in this study would have exceeded the computational capacity of the GEE platform, leading to timeout errors (Gorelick et al. 2017; Shafizadeh-Moghadam et al. 2021). Therefore, the implementation of pixel-based analysis helped mitigate the high computational cost requirements. For the pixel-based analysis, the Random Forest (RF) machine learning algorithm was selected. RF is an ensemble classifier that utilizes subsets of randomly selected training data to construct multiple decision trees (Breiman 2001; Breiman 1996; Dlamini et al. 2021b). Each tree contributes a unit vote, and the class with the highest number of votes determines the classification of a particular object (Breiman 2001). The choice of RF was based on its performance in previous studies, such as Adam et al. (2014); Basheer et al. (2022); Dlamini et al. 2021a); Nitze et al. (2012). RF is particularly suitable for handling significant differentiation within land cover classes and reducing noise in the data (Slagter et al. 2020), and does not require prior knowledge of data distribution, unlike parametric algorithms like the maximum likelihood classifier (Slagter et al. 2020). During the implementation of the pixel-based RF model, the 1420 ground truth points representing different land cover classes (Table 2) were randomly divided using the commonly used split ratio of 70% training data and 30% validation data within the GEE platform. The training data were used to sample areas on the input image corresponding to their locations, and these sampled areas, along with the training points, were used to train the RF classifier. The trained classifier was then applied to classify the mosaicked images.

During this process, a search was conducted to determine the optimal values for the input parameters of the RF model, namely mtry and ntree. The mtry parameter represents the number of randomly sampled candidate variables, while the ntree parameter indicates the number of trees generated by the RF model based on different bootstrap samples. The search for optimal mtry and ntree values ranged from one to five for mtry and from five hundred to fifteen thousand for ntree. The ntree search interval was five hundred. The ntree search interval was five hundred. The selection of these ranges was guided by similar studies, including Adam et al. (2014); Simioni et al. (2020). The search resulted in 300 combinations of mtry and ntree values through several iterations, and the values that yielded higher overall accuracies were chosen as the optimal parameters.

Accuracy assessment

Accuracy assessment in this study involved the use of five metrics: internal and external Overall Accuracy (OA), Producer’s Accuracy (PA), User’s Accuracy (UA), Commission Errors (CE), and Omission Errors (OE). These metrics provide valuable insights into the performance and reliability of the classification results. OA measures the proportion of correctly classified pixels, providing an overall assessment of accuracy. PA and UA indicate the accuracy of individual land cover types, representing the likelihood that a classified pixel corresponds to the actual feature on the ground (Story and Congalton 1986). CE identifies incorrectly classified sites, while OE quantifies the number of sites that were omitted from the correct class on the map (Mtengwana et al. 2020). To implement the accuracy assessment, the remaining 30% of the data (validation data) was utilized to sample regions on the classified image. These regions, along with the validation points, were used as input files in the GEE cloud computing platform to calculate OA, UA, and PA. Additionally, the files were employed to extract the error matrix, which facilitated the calculation of CE and OE.

The performance of the model was evaluated based on the values of OA, PA, and UA. The OA, PA, and UA accuracy statistics were interpreted as follows: accuracies ranging between 60−69% were regarded as average, 70–79% were considered good, 80–89% were regarded as very good, and accuracies above 90% were classified as excellent (Manandhar et al. 2009; Tilahun 2015).

Land use and land cover change analysis

To assess changes in land use and land cover (LULC) within the transboundary basin, spatial analyst tools in ArcGIS 10.3 were employed. The raster maps depicting the spatial distribution of each land cover class were imported into the software. Subsequently, these raster maps were converted into vector files, allowing for further analysis and visualization of the land cover transitions between the different time periods. The Sankey diagram was selected as a suitable visualization tool for this purpose, drawing inspiration from previous studies such as Cuba (2015); De Alban et al. (2018); Spruce et al. (2020) that effectively utilized Sankey diagrams to depict LULC changes across various time periods. The diagram was plotted using the e! Sankey software. To generate the Sankey diagram, the extracted class areas were used as input, providing the necessary information to assess and visualize the transitions between land cover classes. The e! Sankey software was utilized to create the Sankey diagram, facilitating an intuitive representation of the flow and magnitude of land cover changes over time. In addition to the Sankey diagram, percentage gains and losses were calculated for each land cover class relative to the total surface area of the transboundary basin. This analysis provided quantitative insights into the extent of change within each class, enabling a comprehensive understanding of the LULC dynamics in the study area.

Results

Accuracy assessment results

Figure 3 presents the results of the external overall accuracy assessment for the classification of the mosaicked image representing the different time periods studied (Period 1: 2000–2005, Period 2: 2006–2010, Period 3: 2011–2015, Period 4: 2016–2020). The external and internal OA values obtained in this analysis fell within the range of 79–87%, and 82–88%, respectively. The highest external OA value of 87% (internal OA = 84%) was achieved for the time Period 1 (2000–2005), indicating a high level of accuracy in classifying the land cover types during that period. On the other hand, the lowest external OA value of 80% (internal OA = 82%) was observed for the time Period 2 (2006–2010), suggesting a relatively lower level of accuracy compared to the other time periods. These OA results provide an assessment of the overall performance of the classification model in accurately assigning land cover classes to the mosaicked image. The obtained values demonstrate the effectiveness of the approach in capturing the temporal dynamics of land cover changes within the study area across the different time periods.

Fig. 3
figure 3

Overall accuracy for each time period (Period 1: 2000–2005, Period 2: 2006–2010, Period 3: 2011–2015, Period 4:2016–2020)

The classes bare surface, shrublands, and open water consistently exhibited high producer’s and user’s accuracies compared to other classes in all time periods (Fig. 4a). These classes consistently achieved accuracies above 70% across the board. While classes such as built-up areas and forests showed higher producer’s and user’s accuracies (> 60%) during the 2000–2005 period, their accuracies dropped significantly during the 2006–2010 and 2011–2015 periods. In particular, the bare surface class had low producer’s accuracy (< 20%) during the 2016–2020 period. Croplands demonstrated acceptable producer’s and user’s accuracies ranging between 50 and 60% for all time periods, except for the 2006–2010 period where both metrics fell below 40%. The wetlands class had the lowest producer’s and user’s accuracies, ranging between 30% and 45% for all time periods. Furthermore, the sparse vegetation class consistently displayed producer’s and user’s accuracies below 20% across all time periods. These results highlight the varying accuracies achieved for different land cover classes throughout the study.

Fig. 4
figure 4

Class accuracies where a shows user’s and producer’s accuracy, and b shows Pontius commission and omission errors per studied time periods

Figure 4b illustrates the results of Pontius’ commission and omission errors for the different land cover classes. Classes such as sparse vegetation and wetlands exhibited high commission and omission errors, exceeding 50% for most of the time periods. These high error values suggest potential misclassification of these classes. Furthermore, the grasslands class had high commission and omission errors, surpassing 60% for the 2011–2015 and 2016–2020 time periods. The forest class also had omission errors above 50% for the 2011–2015 and 2016–2020 periods. On the other hand, classes such as shrublands, croplands, open water, built-up areas, and bare surface had low commission errors (< 30%) for all time periods, except for the 2006–2010 period where built-up areas exhibited a commission error above 50%. These results highlight the varying levels of commission and omission errors across different land cover classes. The classes of sparse vegetation, wetlands, grasslands, and forest had higher error rates, suggesting potential challenges in accurately classifying these land cover types. Conversely, the classes of shrublands, croplands, open water, built-up areas, and bare surface had lower error rates, indicating better classification performance for these land cover types.

Land cover change analysis

Figure 5 depicts the spatial distribution of different wetland cover classes based on the pixel-based classification for each time period. The findings reveal that shrublands were consistently the most dominant land cover class across all study periods, with class areas ranging from 3,491,650.28 hectares to 36,548,811.54 hectares. Conversely, wetlands and sparse vegetation were found to be the least dominant classes throughout all time periods. The class areas for wetlands ranged from 14,511.2 hectares to 32,241 hectares, while sparse vegetation ranged from 21.33 hectares to 4937.4 hectares. The lowest values were recorded for the time Period 4: (2016–2020) for both wetlands and sparse vegetation (refer to Table 3 for detailed information).

Fig. 5
figure 5

Spatial distribution of LULC cover classes based on the pixel-based classification for the studies time period

Table 3 Land cover class areas (ha) per time period

Furthermore, the results indicate a consistent expansion of growth in croplands, tree cover tree cover, and shrublands from time Period 1: (2000–2005) to time Period 4: (2016–2020). Additionally, built-up areas and bare surfaces also expanded between time Periods 1: (2000–2005) and 2: (2006–2010). However, an anomaly was observed for the built-up area class between time periods 2 and 4, where the coverage continued to decline. This anomaly is likely attributed to classification errors, which could have affected the accurate identification of the built-up areas. These findings highlight the changing spatial distribution of wetland cover classes over time, with shrublands consistently dominating the landscape. The increase in croplands, grasslands, and built-up areas reflect land use changes within the study area. However, caution should be exercised when interpreting the anomalies observed, particularly for the decline in built-up areas between time Periods 2 and 4, as they may be influenced by classification errors.

LULC transitions and implications on wetlands extent

Figure 6 presents the land use and land cover (LULC) transitions between the different time periods, with major changes (> 20% loss or gain) labeled along with their respective proportions. There are significant changes in various land cover classes, including wetlands, sparse vegetation, grasslands, bare surface, croplands, built-up areas, and shrublands, over the studied time periods. Between Period 1 (2000–2005) and Period 2 (2006–2010), notable changes occurred. Approximately 40% of the wetland areas were replaced by built-up areas, 61% of sparse vegetation was replaced by tree cover, 40% of grasslands were also replaced by tree cover, and 20% of bare surface was replaced by croplands. During the transition from Period 2 (2006–2010) to Period 3 (2011–2015), major changes occurred between sparse vegetation and tree cover, grasslands and tree cover, bare surface, and croplands, as well as built-up areas and bare surface. These changes involved the replacement of 61% of sparse vegetation by tree cover, 37% of grasslands by tree cover, 25% of bare surface by croplands, and 20% of built-up areas by bare surface. Between Period 3 (2011–2015) and Period 4 (2016–2020), significant LULC changes occurred. These involved changes between bare surface and croplands, built-up areas and croplands, wetlands and shrublands, grasslands and tree cover, as well as sparse vegetation and tree cover. Notably, 36% of bare surface was replaced by croplands, 32% of built-up areas were replaced by croplands, 46% of grasslands were replaced by tree cover, 63% of sparse vegetation was replaced by tree cover, and approximately 30% of the wetland area was replaced by shrublands. The results indicate a continuous decline in wetland areas and sparse vegetation throughout the studied time P, while croplands, tree cover, and shrublands experienced increases (Fig. 6).

Fig. 6
figure 6

LULC transitions between the time periods with major transition labelled with respective proportion

Discussion

This study sought to analyze the wetland coverage changes in the Limpopo transboundary river basin over a 20-year Period (2000–2020) using cloud-based earth observation data. The classification results revealed that bare surface, built-up areas, and shrublands were the dominant land cover classes across all the time periods studied, while wetlands and sparse vegetation had the least spatial extent. The limited representation of wetlands in the classification results may be attributed to the use of dry season images due to the unavailability of images with minimal cloud coverage during the wet season. In semi-arid areas, small and intermittently flooded systems often blend with the surrounding terrestrial ecosystem, making them less distinguishable in the images (Day et al. 2010; Fang et al. 2019). However, to mitigate this limitation, the study employed indices such as the mNDWI, which is sensitive to moisture content and can detect moisture even in the absence of surface inundation. As such, the model managed to detect some wetlands even in the absence of surface water inundation from the use of mNDWI coupled with spectral bands used overtime. This approach has also been successfully used by Gxokwe et al. (2022); Du et al. (2016); Pal and Sarda (2020) to identify moisture content in similar contexts. Additionally, the use of object-based analysis and its contextual characteristics could have further improved the identification of wetland features within the study area. By considering these factors and incorporating relevant indices and analytical techniques, the study aimed to enhance the identification and understanding of wetland extent in the Limpopo transboundary river basin.

The classification accuracy assessment of the study revealed that the overall accuracies (OA) for all the time periods fell within the acceptable range of 78% and 87%. However, certain land cover classes, such as wetlands and sparse vegetation, exhibited high commission and omission errors, along with low producer’s and user’s accuracies, during specific time periods. These errors could potentially be attributed to imbalances in the training and validation data, which were influenced by the limited spatial coverage of certain classes, particularly wetlands and sparse vegetation, within the study area. The presence of imbalanced training data can introduce biasness during the classification process, favoring the classes that are well-represented while leading to inaccuracies in the classification of underrepresented classes. Moreover, the definition of wetland used in this study might have resulted in exclusion of some wetland types that are not regarded as wetlands in South Africa, but wetlands in some countries sharing the transboundary basin thus resulted in limited spatial coverage of wetlands considered in the study, therefor resulted in limited training points for this class. Several studies, such as Millard and Richardson (2013); Ustuner et al. (2016); Amani et al. (2021), have demonstrated the impact of imbalanced training data on classification outputs, where underrepresented classes tend to be inaccurately classified compared to the well-represented classes. In this study, the limited spatial coverage of wetlands and sparse vegetation in the Limpopo transboundary river basin (LTRB) resulted in a lower number of training and validation points for these classes. Consequently, this imbalance in data distribution may have introduced bias towards classes with a higher number of training and validation points, leading to more accurate classification for those classes and potentially affecting the accuracy of wetland and sparse vegetation classification. Thus, it is important to acknowledge this limitation in the study, which could have influenced the accuracy assessment of these specific land cover classes. The findings of this study hold significant implications for environmental planners and conservation specialists, shedding light on the declining state of semi-arid wetlands in the Limpopo transboundary river basin (LTRB). These insights serve as crucial baseline information necessary for the development of effective strategies aimed at mitigating the adverse impacts of land use and land cover (LULC) changes on wetlands in the region.

The LULC change analysis results of the study indicate an increase in the extent of shrublands, tree cover, and croplands, while wetlands and sparse vegetation exhibit a consistent decline. Notably, approximately 40% of the wetlands have been converted to built-up areas, suggesting that the ongoing decline is primarily driven by anthropogenic activities associated with urban expansion. Urban area expansion involves major construction and development activities, which include excavation, filling, and drainage of wetland systems. These activities result in major destruction to wetlands thus leading to their shrinkage in extent. This is most probably the case in our study where a larger of the wetland areas was converted to built-up area particularly between Period 1 and Period 2. This finding of this study aligns with previous studies conducted within the Limpopo transboundary river basin (LTRB) at sub-basin levels. For instance, Thamaga et al. (2022) analyzed the impacts of land use and land cover changes on unprotected systems in one of the sub-basins located on the South African side of the LTRB. The study revealed that urbanization emerged as a major driver of wetland loss, corroborating the findings of the current study. Similarly, Sibanda and Ahmed (2021) analyses and predicted changes in LULC of the Shashe Catchment, a sub-basin of the LTRB situated on the Zimbabwean side for the period between 1985 and 2015 using Landsat series remotely sensed data. Their findings indicated the major changes between wetlands and built-up areas and between wetlands and cropland. Therefore, corroborating with our findings at this catchment at regional scale. Further, Sibanda and Ahmed (2021) predicted shrinkage of 40% for the wetland areal extent, due to expansion of croplands. This prediction agrees with our finding which revealed that a proportion of 40% of wetlands areal extent was lost at LTRB scale, however this was due to expansion of built-up area not croplands as per their suggestion. Other similar trends have been observed in other semi-arid regions of Africa and beyond. Marambanyika et al. (2017) and Chikodzi and Mufori (2018) established that urbanization was leading to wetland loss have reported the impact of land use and land cover changes on wetlands, particularly driven by urbanization. Although these studies focused on smaller scales, their findings are consistent with the results obtained in our study. 35% of the wetlands area during Period 4 was replaced by shrubland. This may be due to the fires likely to be driven to a great extent by human inhabition, and as a result these fire cause resporunting of certain types of woody vegetation including shrublands in various ecosystems including wetlands (Bond and Midgley 2001; Teixeira et al. 2020). Timmins (1992) reported the resprouting of woody shrublands in some wetlands in New Zealand after some fire event which had occured in the area. The replacement of some wetland area by shrublands can also be attributed to climate change impacts since elevated CO2 levels favour the growth and expansion of shrublands and grasslands in the wetlands area (Archer et al. 1995).

Another major land use and land cover change established in this study is the conversion between bare surface and croplands. This phenomenon can be attributed to the cycle of crop planting and growth stages. At certain periods, the crops may not have fully grown, resulting in low canopy cover, and exposing bare surfaces. This has also been reported by Ziter et al. (2019), and by Sibanda and Ahmed (2021). The change between wetlands, grasslands, and shrublands can be explained by the prolonged availability of moisture in wetland soils, which facilitates the growth of different vegetation types, including grasslands and shrublands. This is particularly relevant in the context of the studied area, which is characterized by semi-arid climate semi-arid (Day et al. 2010).

It is worth emphasizing the importance of considering the conservation status of small and intermittently flooded wetlands, which are often overlooked due to their size and intermittent nature (Chen et al. 2013). Despite their modest dimensions, these wetlands play a vital role in providing essential ecohydrological services to the surrounding communities (Carolissen 2022). Therefore, understanding and addressing the LULC changes affecting these wetlands are of paramount importance. The successful management of wetlands at a regional scale, particularly in transboundary basins, necessitates comprehensive and integrated data collection, as well as the availability of reliable information for informed decision-making. However, the challenges of conducting extensive data collection and generating dependable information often impede effective wetlands management strategies, primarily due to limited resources. This study fills an important gap by offering robust, cost-effective, and efficient methodologies based on remote sensing techniques. These methodologies provide reliable data and information, enabling a more precise understanding of LULC changes and their impact on wetland extents within larger regional scales. By employing these methodologies, policymakers and managers can make informed decisions and develop effective LULC change management strategies to safeguard and manage wetland ecosystems (Musasa and Marambanyika 2020). In summary, this study contributes valuable insights and methodological advancements to support the formulation of comprehensive wetlands management strategies. By harnessing remote sensing technologies, this research offers an efficient and reliable approach to inform decision-making processes, particularly regarding the extent of wetlands and their response to LULC changes at larger regional scales.

Limitations of study

This study has provided valuable insights and methodological advancements, but it is important to acknowledge its limitations. Firstly, the findings may be influenced by the limited spatial coverage of certain land cover classes, such as wetlands and sparse vegetation, in the Limpopo transboundary river basin (LTRB). This spatial limitation could result in the some classes having low samples collected during the stratified random sampling, thus introducing biasness in the classification results and affecting the accuracy of the findings for these specific classes (Gxokwe et al. 2022). Additionally, classification errors may be present despite efforts to ensure accuracy, stemming from limitations in the remote sensing data, image interpretation, or the chosen classification algorithm. The study also faced the challenge of relying on dry season images, which may have limited the detection and classification of wetland areas, particularly small and intermittently flooded systems. Moreover, conducting comprehensive data collection and analysis at a larger regional scale, such as transboundary basins, is hindered by limited resources and data availability. Therefore, it is crucial to consider these limitations when interpreting the study’s results and applying them to real-world management and decision-making processes.

Conclusion

This study utilized cloud large scale-based earth observation data to analyze land use and land cover change dynamics on wetland systems in the Limpopo transboundary river basin, over a 20-year period (2000–2020). The findings revealed the presence of nine distinct land cover classes, including tree cover, shrublands, croplands, grasslands, wetlands, sparse vegetation, bare surface, and built-up areas. Shrublands emerged as the dominant class, covering approximately 76–82% of the study areas throughout the analysis period. On the other hand, wetlands and sparse vegetation were the least dominant classes, covering proportions ranging from 0.9 to 2% and 0.3–0.04%, respectively. The study observed decline in wetland and sparse vegetation areas, with average rates of 19% and 44% reduction over the 20-year period, respectively. Conversely, shrublands, croplands, and tree cover exhibited increasing trends, with average rates of 0.4%, 12.4%, and 4.25%, respectively. The study’s findings provide crucial insights into the state of semi-arid wetlands in the LTRB. These insights are of great significance to environmental planners and ecosystems management specialists as they serve as baseline information necessary for developing effective strategies to mitigate the adverse impacts of LULC changes, particularly on semi-arid ecosystems such as wetlands in the LTRB.