Introduction

The vegetation and waterbody maps are among LULC maps that provide a standard land surface outline [1,2,3]. The precise and timely collection of LULC information aids in gathering knowledge about human society's evolution and development. Subsequently, such investigations will allow active climate and environmental change modeling studies to develop, revealing the significance of climate change in society. Remote sensing (RS) imaging, GEE, and ML programming methods have been extensively used for classifying LULC at local and regional application scales [4,5,6]. However, the mid-1970s were a turning point for applying various interpretation approaches that aided in compiling LULC maps and change detection studies [7,8,9]. Owing to the importance of timely LULC information and advanced methodologies, the LULC variation mapping approach has evolved over the last five decades [10, 11]. One of the goals of scientific communities has been to generate spatially explicit LULC maps with the shortest possible temporal delay and best spatial resolution. The rapid development of RS technology has been accomplished to some extent compared with conventional surveying and mapping techniques [12,13,14,15]. These patients benefit from multiple observations and extensive coverage when suggesting high efficiency [16,17,18]. The most important aspect of modern-day planning is assessing LULC change and its implications as a baseline requirement for sustainable natural resource development, planning, and management [19, 20]. Many researchers have claimed that LULC significantly and consistently impacts the functioning of socioeconomic and environmental systems, with substantial trade-offs for sustainability, food security, biodiversity, and people's and ecosystems' socioeconomic vulnerability [21, 22]. Environmental factors such as altitude, geomorphology, hydrology, geological structure, soil type, slope, and technological, socioeconomic, and institutional characteristics are exclusively influenced by LULC classes [23,24,25]. The annual-scale LULC maps in the conterminous United States were created from 1973 to 2000. The dynamic patterns revealed that nearly 8.6% of the geographical area in the U.S. had experienced a change in LULC at least once during the analysis horizon [26]. Urban and rural population development, industrialization, climate change, and deforestation have all immediately impacted the LULC classes on the planet [27,28,29]. This has impacted various geo-environmental and ecosystem issues, including biodiversity, pollution, freshwater, the energy budget, and land use policy [30,31,32]. Moreover, LULC changes significantly affect climate, biogeochemical cycles, energy fluxes, and livelihoods [30, 33, 34].

To create long time-series LULC maps, remote sensing data are acquired and subsequently applied to unravel the intriguing layers of the Earth's land surface [35, 36]. Machine learning algorithms have played an essential role in LULC mapping and change analysis, particularly in optimizing the processing of large volumes of image stacks and mining the unique patterns and information associated with subsequent LULC modifications [37, 38]. Likewise, satellite images with improved resolutions have been more helpful in acquiring LULC maps with considerable knowledge and greater accuracy. Regional studies have employed a wide range of techniques, algorithms, and methods to accurately categorize LULC change maps [39, 40]. The utilization of Landsat images has proven valuable in providing a comprehensive overview of diverse landscapes [41, 42]. Additionally, object detection methods, machine learning classifiers, remote sensing, and geographic information system (GIS) approaches have been implemented to enhance the identification of LULC classes, leading to more accurate and reliable LULC maps [43]. Unmonitored gradation, supervised classification, GEE, machine-learning programming, and fuzzy grading are the most often used methodologies for image categorization via RS and GIS software [44, 45]. [46] prepared LULC variation maps based on parcels using a machine learning approach. [47] demonstrated future LULC changes and calibration via an impact assessment of LULC changes in two areas of Tehran, Iran. [48] used machine learning models such as support vector machines (SVMs) and artificial neural networks (ANNs) to extract better information and improve the accuracy of LULC change patterns in Kuwait. These methodologies, in fact, extend beyond LULC classification and have demonstrated successful applications in understanding a wide range of phenomena. These methodologies have been effectively applied in studying hydro-meteorological variations, climatological alterations, and the impacts of natural and anthropogenic disasters, such as floods and droughts. Their applications have provided valuable insights into sectors such as water resources and agriculture, among others [49,50,51,52]. To summarize, the studies mentioned above provide compelling evidence for the extensive application of GEE, ML, and other related platforms. These findings collectively highlight the robustness and effectiveness of these methodologies in various research domains.

With the advent of cloud computing approaches, several powerful platforms have been available for constructing LULC maps. Among these established platforms, the GEE has been used to conduct innovative research with greater ease in computational efficiency [53, 54]. The GEE has a significant volume of numerous long-term raw remote sensing observations and dynamic classification algorithms, such as RF models [55, 56] and support vector machines (SVMs) [57, 58], which make it potentially effective at preparing continuous long-term LULC maps. Many studies have been piloted on the GEE and associated Google Earth platforms, such as global urban land clipped, surface water monitoring, land surface temperature, vegetation indices mapping, and LULC mapping with Landsat-8 images [59,60,61,62,63]. The combination of satellite data with indices and machine learning models for correcting demarcated surface water structures was based on the GEE for the Thoubal River Watershed, India [64]. LULC mapping is most important for understanding land changes and ecological systems; hence, all of these studies have produced rapid and precise results based on ML modeling via the GEE platform [65]. Object-oriented LULC extraction was performed using the Google Earth Engine platform with the integration of classifier models such as SNIC, GLCM, and ML [66]. The RF model was used to classify Zambian grasslands based on satellite data and feature importance [67]. The LULC mapping classification was used to integrate the pixel- and object-based approach from multi-temporal data, and this mapping included a random forest classifier [68]. Long-term spatiotemporal changes in surface water bodies in the Yellow River Basin from 1986 to 2020 were identified using GEE, ML and satellite data [69]. The effect of LULC variation on evapotranspiration and ecosystem system identification was studied using an ML model classifier via the GEE platform [70]

This study established a rigorous methodology for accurate LULC mapping and the identification of LULC classes utilizing the GEE platform. This study developed and compared the performances of two RF models, namely, RF-50 and RF-100, utilizing satellite data and training samples encompassing seven satellite bands for the Shrirampur area (in India) based on multi-temporal 30 m Landsat-8 satellite images. These models were instrumental in effectively classifying and generating LULC maps for the years 2014 and 2020. In addition, many regions across the globe are experiencing climate-related issues, yet the reasons behind the constant fluctuations in climate and pollution levels on the Earth's surface remain poorly understood. It is evident that various factors, including changes in LULC patterns, significantly impact climate and agricultural fields. In light of these circumstances, accurate and timely LULC maps can be developed with ML models. The present study employed the RF algorithm as mentioned before. The selection of these models allows the investigation of which model yields higher accuracy in classifying images for LULC mapping. This study aimed to gain valuable insights into the relationships among LULC patterns, climate dynamics, and agricultural impacts by adopting these approaches.

The specific objectives and aims of this study were to (1) develop RF models with 50 and 100 trees to achieve accurate LULC mapping within the study area. By employing advanced ML, we aim to enhance the precision and reliability of LULC classification. (2) identify changes in LULC mapping by utilizing both the RF-50 and RF-100 models in conjunction with the GEE platform. This approach enables a comprehensive analysis of temporal variations in land use and land cover, facilitating a deeper understanding of LULC dynamics over time. (3) Analyze the effectiveness of the proposed algorithms in accurately identifying individual LULC classes and assessing the classification accuracy. By evaluating the performance of the RF models, we can determine their ability to distinguish different land cover categories, thereby providing insights into the strengths and limitations of the classification approach. (4) Additionally, we can compare and determine the optimal tree model and input parameters for LULC mapping using GEE and a machine learning approach. This comparative analysis will inform the selection of the most suitable model configuration and input parameters to achieve high-precision LULC mapping results. By identifying the optimal approach, we aim to improve the overall accuracy and reliability of the generated LULC maps. By achieving these objectives, the study aimed to offer practical recommendations for LULC mapping in the study area. The outcomes of this research can guide decision-makers in implementing efficient and timely management strategies based on up-to-date LULC information. A comparison between the RF-50 and RF-100 tree models will provide valuable insights for selecting the most suitable approach for accurate LULC mapping.

Materials and methods

Study area and database

The study area is located at 19.62 N, 74.66 E in the western Maharashtra, as shown in Fig. 1. The most significant land cover type is farmland. Large patches and plain topography characterize the agricultural area, and dry land is commonly cultivated with sugarcane, wheat, and onion in rotation of the study area. The mean temperature, wind speed, and humidity are 27 °C, 14 km/h, and 70%, respectively, for the study area. Sugarcane is commonly harvested in December and January, while sugarcane fields are sown between June and February. Wheat is generally harvested in March, and onion is grown between June and August. The annual rainfall is 400 to 550 mm, mostly confined to the monsoon season (June–October). The basin experiences highly undulating topography, with the highest elevation being 541 m in the upland locations.

Fig. 1
figure 1

Location map of the study area

Remote sensing database

The 30-m Landsat-8 top-of-atmosphere (TOA) multitemporal satellite images were used for LULC mapping for 2014–2020; these images were subsequently accumulated and systematically processed on the GEE platform [71]. The revisit phase of Landsat-8 satellites can reach 16 days, and after one month, two datasets are available as time series observations. Landsat-8 TOA images of 12 varying spectral and thermal bands, with spatial resolutions varying from 30 m to 60 or 100 m, individually serve different purposes. Considering that is not all Landsat-8 satellite images have been geometrically and atmospherically rectified [72], the Landsat-8 TOA reflectance datasets under Level 1-C were used in the land use mapping for the period 2014–2020. Furthermore, due to cloud cover, air pollution, and seasonal, intermittent rainfall, this study used < 10% clouds to filter the accessible data through the downloading data gateway. All Landsat-8 satellite images were acquired for LULC extraction during 2014–2020 using 50 and 100 tree RF models. The satellite images were obtained and processed using the GEE platform and developed algorithm, while the data availability were checked by the Google Earth code editor platform (Table 1) [73].

Table 1 Specification of the satellite products used

Training and validation datasets

This study identified and collected random field datasets for each LULC class with some selected data points using the visual interpretation method from Google Earth images. To ensure the accuracy and robustness of the image classification, 800 samples were collected for each year, as shown in Fig. 2. The collected sample data were divided into 70% and 30% for the training and validation periods, respectively, for both the 50 and 100 tree classification models. These datasets have been used in the classification models for 50 and 100 tree RF models using the GEE interface.

Fig. 2
figure 2

Training points (field location-based) in the study area

Image classification protocols

The GEE is a cloud-based and web-based graphical user interface (GUI) platform. The GEE serves as a repository for a vast collection of remote sensing datasets. The GEE combines a multipetabyte catalog of satellite imagery and geospatial datasets with planetary-scale analysis capabilities [35, 36]. Developers use GEE to detect changes and map trends and quantify differences on the Earth's surface. The public data archive includes more than thirty years of historical imagery and scientific datasets that are updated and expanded daily. GEE can enable processing of large datasets using the JavaScript code editor platform, which shortens the correction and processing of satellite datasets [74]. In this research, the LULCs were classified using the RF-50 and RF-100 tree models in the GEE platform, which included processing, atmospheric correction, cloud-free image creation, missing and gap data filling, training, testing, and confusion matrix preparation. In this study, we compared 50 and 100 RF tree models, which yielded more accurate LULC mapping based on the machine learning approach and GEE platform. Cloud-free satellite data are available for the selected dates and subsequently estimated to improve the discriminatory ability of various land cover patterns. During image processing, composites are formed and used for land cover data to avoid missing information and cloudy images, and land cover maps are derived using the GEE platform. We have been selected a two-year dataset for the periods 2014 and 2020 for the preparation of LULC maps and subsequently used it to analyze the spatiotemporal distribution of the LULC maps using the RF-50 and RF-100 tree models and observed the LULC changes in the study area.

The classification procedure was separated into three steps: data collection, segmentation, and classification. Two approaches characterize the conventional LULC classification method: unsupervised and supervised classifications, wherein the former is based on a radiometric resolution-based algorithm-specific classification. It was later characterized by ground truthing-based user-defined classification. In the present study, the supervised classification algorithm was chosen to evaluate the potential of existing image classification approaches. During classification, ground truthing was performed using the GEE, wherein at least 200 training samples were chosen for individual LULC classes. To apply the RF model, two basic parameters are needed: the number of trees (ntree) for building an entire forest and the number of selected features utilized for node splitting. The selected study locations inside the study area chosen for analysis were identified on the basis of satellite images, as presented in Fig. 3. LULC change detection was performed using RF-50 and 100 tree models, and the model that exhibited better results is recommended for effective LULC mapping at the location of interest. A flowchart of the adopted novel methodology is shown in Fig. 4, Table 2a to d and Table 3 a to d.

Fig. 3
figure 3

The selected sites of the LULC classes from 2014 to 2020: a built-up land of 2014, b agricultural land of 2014, c built-up land of 2020, d agricultural land of 2020, e waste land of 2014, f water body of 2014, g waste land of 2020, and h water body of 2020

Fig. 4
figure 4

Flowchart of the methodology framework of the study

Table 2 Resubstitution error and confusion matrix for the random forest (RF) 50 model
Table 3 Resubstitution error and confusion matrix for Random Forest (RF)-100 model

Random forest machine learning model

It is a bootstrap resampling-based combinatorial classifier that extracts numerous subsets of training samples from the original data to generate multiple subdatasets [75, 76]. Furthermore, various iterations of the decision process are performed using the previously generated datasets to train the model. The decision trees are treated as classification trees operated based on a set of binary rules to estimate a target value. The RF algorithm computes response variables such as the land cover class by generating numerous decision trees (in the order of 100 s). Subsequently, each object to be modeled is attached below the decision tree. The response from individual decision trees was evaluated, and the most predicted class was assigned as the targeted LULC class. Hence, the efficiency of the RF-based model is highly dependent on the mode of creation of the decision trees.

In general, the random selection process in the tree formation stage of the RF algorithm is executed in two steps. In the first step of tree formation, random selection is accompanied by the replacement of data from the training sample. For an individual tree, a subset of the training data is used for decision-making, and reaming data, known as the “out-of-bag (OOB)” sample, are used for testing the model accuracy. The second step uses the binary rule to determine the split condition at each node. The splitting rule primarily includes the maximum information gain, the maximum information gain rate, and the minimum Gini index. In the process of partitioning, the purity of the node increases, i.e., the sample contained by the node is included in the same category. When a large number of trees are generated, voting for the most popular class is performed; hence, this approach is treated as the RF technique.

The RF classifier consisted of a collection of decision tree classifiers, as given below [77]:

$$\left\{ {{\text{h}}\left( {{\text{x,}}\Theta ,} \right){\text{ k = 1,}}...} \right\}$$
(1)

where x is the input vector and {Θk} is the independent identically distributed random vector.

The OOB elements are used to estimate the model performance; this metric is termed an unbiased estimator of generalization error and is given as follows:

$$V\left({x}_{i}\right)= \frac{1}{N}\sum_{N}^{t=1}({e}_{t}^{i}-{e}_{t})$$
(2)

where \(V\left({x}_{i}\right)\) is the mean decrease in the accuracy score, \({e}_{t}\) is the OOB error in each decision tree of the RF, and \({e}_{t}^{i}\) is the new ith OOB error computed by altering the values of variable xi.

The present study adopted two RF classification models considering the models’ varying decision degrees. The first model consists of 50 decision trees, and the second model consists of 100 decision trees, referred to as RF-50 and RF-100. The training of the supervised classification algorithm was performed individually by the two abovementioned models. Finally, the land cover classification for the two analysis years, 2014 and 2020, was achieved using the two classification algorithms. The overall methodology framework adopted in the present study is described in Fig. 4.

Accuracy assessment for LULC maps of classification

An accuracy assessment of any prepared LULC map is a standard statistical procedure that cannot be avoided for obtaining a perfect LULC scenario for the region of interest. The steps followed to accomplish the accuracy assessment are detailed as follows:

  1. 1.

    Initially, 50 random points were generated within the spatial extent of interest with the ‘create random sample’ tool of ArcGIS 10.3.

  2. 2.

    Furthermore, the corresponding LULC classes at predefined random locations were extracted and updated in the attribute table belonging to the “random point raster”.

  3. 3.

    A random point raster file was opened on Google Earth Pro., and individual random points were compared with the ground truth LULC classes derived from Google Earth Pro.

  4. 4.

    The corresponding match between the ground truth LULC class and the image-derived class is assigned a true value.

  5. 5.

    Finally, the overall accuracy was estimated as follows:

    $${Accuracy}_{Overall}=\frac{Total \,sum \,of \,all\, the\, correctly \,accessed \,points}{Total \,number \,of \,random \,points \,taken}\times 100$$
    (3)

A generalized discrete multivariate technique, i.e., the kappa coefficient-based accuracy approach, is further adapted to increase the confidence in the classification accuracy. This approach ensures confidence in the class division within the map. It provides information about the actual accuracy obtained and the agreement expected by chance. Different ranges of the kappa coefficient indicate that a map with a degree of accuracy in the range of 0.85–0.99 is treated as a better model. This approach could be adopted for future LULC preparation. The kappa coefficient is calculated as follows:

$$Kappa \,coefficient=\frac{Observed \,accuracy-chance \,agreement}{1-chance \,agreement}$$
(4)

where \(Observed accuracy\) is the accuracy obtained from the total accuracy and \(chance agreement\) is the sum of the products of the row and column totals for each class.

The points that needed to be validated were generated in the ESRI ArcGIS 10.3 software using the stratified random sampling method of randomly creating control points. The control points of the respective LULC classes were established in the same way as the training polygons were created. The control points were uploaded to the GEE, where the inbuilt classifier package was used to validate the classification. Specifically, the classifier.confusionMatrix() function and the errorMatrix() function of the GEE constructed the confusion matrix and overall accuracy of classification, followed by the ConfusionMatrix.kappa() function for the estimation of the kappa index. The kappa index was estimated for each combination of input parameters, and the combination corresponding to the highest kappa index was selected to determine the overall accuracy.

Results

LULC thematic (spatial) maps and accuracy assessment for the maximum likelihood, RF-50, and RF-100 tree models

The LULC thematic maps of the Shrirampur area for 2014 and 2020 obtained using the RF-50 tree model and RF-100 tree model are presented in Figs. 5 and 6, respectively. The four LULC classes, agricultural land, built-up land, wasteland, and land covered with water bodies, were identified as the essential classes and are the focus of the present investigation. Spatially, the central region of the study site broadly represents built-up land (zoomed region marked as “B”), which is dominantly surrounded by wasteland (region “C”). While the water bodies are limited in the built-up region, their availability is localized primarily in the North and North‒West regions (regions “C” and “D”), with one major water source in the East (region “D”). Agricultural practices are the primary occupation of the study site. They are located at the peripheries of the study boundary (region “A”) in close vicinity with large numbers of micro- and medium-scale water bodies. Qualitatively, significant changes occurred across 2014 and 2020 in the LULC types described above, as evident from the spatial plots; however, the RF-50 Tree and RF-100 Tree models depicted the alterations differently. Therefore, a quantitative analysis was conducted and is presented in Sects. "Advantages, limitations, and future research on the GEE and RF models" to ascertain the essential changes in the LULC classes. This aspect motivated the study to assess each model for the same study site to quantify the models’ suitability.

Fig. 5
figure 5

LUC thematic map for the Shrirampur region in Maharashtra for a 2014 and b 2020; developed using the RF-50 tree model

Fig. 6
figure 6

LUC thematic map for the Shrirampur region in Maharashtra for a 2014 and b 2020; developed using the RF-100 tree model

The RF models developed for the present study site were assessed for accuracy and suitability for the Shrirampur study site. This was performed by developing a resubstitution error matrix and confusion matrix for each LULC type in 2014 and 2020. Table 2a, b, c, d show the matrixes for the RF-50 tree model and RF-100 tree model, respectively. For the RF-50 Tree model, the findings indicated 98% accuracy under overall training for both 2014 and 2020, while 78% and 82% accuracy were recorded under the validation phase for the same years (Fig. 7). Conversely, for the RF-100 Tree model, the findings indicated 99% accuracy under overall training for both 2014 and 2020, while 74% and 79% accuracy were recorded under the validation phase for the same years (Fig. 7b; Table 3a, b, c, d). The developed RF models and derived high-accuracy LULC maps were inferred to be highly suitable for conducting spatiotemporal investigations and pattern analysis at the present study site.

Fig. 7
figure 7

Accuracy assessment of random forest (RF) models, viz., a for the RF-50 tree model and b RF-100 tree model, using the resubstitution error matrix and confusion matrix

Analysis of RF-50 tree model-estimated spatiotemporal changes and patterns of LULC

To quantitatively assess and compare the spatial and temporal changes in different LULC types in the Shrirampur area, the regions estimated from the RF-50 tree model and the cumulative ratio for each class for 2014 and 2020 were computed and are presented in Fig. 8. It can be inferred that agricultural land cover is the most dominant land use, with an area proportion of 84%. This was followed by area cover under the wasteland occupying 7% of the study site. In general, agricultural land use coupled with wasteland remained less altered in view of individual area proportion alterations between 2014 and 2020. In contrast, the area proportions of built-up land and water bodies underwent significant changes across the same years. For example, the area proportion of built-up land decreased from 6% in 2014 to 4% in 2020, whereas the area proportion of water bodies increased from 3% in 2014 to 5% in 2020.

Fig. 8
figure 8

Land use/land cover (LULC) change analysis using the random forest (RF)-50 tree model for 2014 and 2020; a showing LULC changes at a glance for both study periods and b and c showing the area proportion of each LULC type in the respective study periods for the Shrirampur area of Maharashtra

Changes in different LULC types are more apparent in Fig. 10, where a drastic positive shift in the area covered with water bodies, as high as 42% (from 25 km2 to 36 km2), was observed. Conversely, a drastic negative change in area cover under built-up conditions was observed, reaching 26% (from 47 km2 to 35 km2). A comparison of the percentage (%) change between RF-50 Tree and RF-100 Tree revealed that the area of built-up land decreased, as shown in Fig. 9. We compared the results of both models for built-up land and found that the RF-100 tree model extracted less built-up land because of the decrease in built-up land, as shown in Fig. 9. In this paper, we found that the RF-50 tree better classified built-up land than did the RF-100 tree, and the RF-100 tree had a slightly lower performance in classifying built-up land overall. In contrast, the field investigation and qualitative interviews revealed rising households (built-up land) and a reduced water quantity across the study site. To argue, the increase in the area covered by water bodies and decrease in built-up land area are opposite to the prevailing (general) trends in the Shrirampur region. Despite these facts, the RF-50 Tree model yielded contradicting results (at least in the case of built-up spaces and water bodies). The aforesaid findings suggested redoing the analysis using the RF-100 Tree model to ascertain the suitability of a specific RF model in coherence with the study site conditions. Moreover, it is important to highlight here that merely applying an RF model to a study site, just because its accuracy assessment is appropriate (as observed for the RF-50 Tree model in the present case), is insufficient to determine the different degrees of alternation in LULC. There is a need to develop multiple models of such types (for example, the two models developed for the present study area, viz., the RF-50 Tree and RF-100 Tree models) to compare their findings and, if possible, coupled with field investigations (ground truthing) to validate the results.

Fig. 9
figure 9

Comparative analysis of LULC alterations between 2014 and 2020 for the random forest (RF)-50 tree model developed for the Shrirampur area of Maharashtra

Analysis of RF-100 tree model-estimated spatiotemporal changes and patterns of LULC

This section aimed to quantitatively assess and compare the spatial and temporal changes in different LULC types in Shrirampur Province via area estimations generated by the RF-100 tree model. These parameters were computed statistically (the cumulative ratios for each class for 2014 and 2020 were computed) and are presented in Fig. 10. Like in the RF-50 Tree model, in the RF-100 Tree model, agricultural land cover was the most common land use, with an area proportion of nearly 84% (a slight increase of 1% between 2014 and 2020 was observed). This was followed by area cover under the wasteland occupying 7% of the study site, as observed in the RF-50 Tree model. In general, agricultural land use coupled with wasteland also remained less impacted in view of individual area proportion alterations between 2014 and 2020. The area under built-up land was also observed to be more or less the same during the study period (4%); however, water bodies changed across the same years, decreasing from 6% in 2014 to 5% in 2020 (against the increasing trend, as indicated by the RF-50 Tree model).

Fig. 10
figure 10

Land use/land cover (LULC) change analysis using the random forest (RF)-100 tree model for 2014 and 2020; a LULC changes at a glance for both study periods and b and c the area proportions of each LULC type in the respective study periods for the Shrirampur area of Maharashtra

Changes in different LULC types are more apparent in Fig. 11, where a drastic positive shift in area cover under built-up land, as high as 24% (from 34 km2 to 42 km2), was observed. However, a moderate negative change in the area covered with water bodies, which was 6% (from 46 km2 to 56 km2), was observed. These findings are in agreement with the observations drawn from field investigations and qualitative interviews and contrary to the findings of the RF-50 Tree model. Importantly, even though both the RF-50 and RF-100 models satisfied the accuracy assessment criteria, their area demarcation ability and area proportion were observed to vary, thereby resulting in different results for the same study site. In the case of the former, the trend of LULC alterations for water bodies and built-up land was observed to increase and decrease, respectively; however, the trend almost reversed when it was subjected to the latter. For the other two land use types, viz., wasteland and agriculture, their performances were similar. There is a need to quantify the difference in performance between these two models (done in the next section). Furthermore, developing a model for multiple study sites may provide contradictory results whose validation could not be performed unless a multiple model development approach is considered (as happened in the present case with the RF-50 Tree and RF-100 Tree models). Hence, the findings and inferences suggest considering a multiple model development approach for deciding the suitability of RF models alongside conducting accuracy assessments.

Fig. 11
figure 11

Comparative analysis of LULC alterations between 2014 and 2020 for the random forest (RF)-100 tree model developed for the Shrirampur area of Maharashtra

Discussion

Typewise comparisons between LULC classes generated by the RF-50 tree and RF-100 tree models

The present section describes the varying results yielded by the RF-50 and RF-100 models for the Shrirampur study area. The objective here is to quantify the suitability of these two models by comparing their area estimations. Figure 12 shows the spatial plots for different LULC types developed using the RF-50 and RF-100 models for 2014 and 2020. These two models provide insight into the visual interpretation of alterations in LULC. LULC classes such as agricultural land, built-up land, wasteland, and water bodies were estimated by the RF-50 and RF-100 models, as shown in Fig. 13. In general, the findings for each LULC type indicated a substantial difference in area estimations for 2014 and 2020, especially for water bodies and built-up land (as was also identified in the previous sections). For example, in the case of water bodies, the area estimated by the RF-50 Tree model for 2014 was 25 km2, while that estimated by the RF-100 Tree model was 46 km2. A large positive difference (meaning that the estimated area increased from the RF-50 Tree model to the RF-100 Tree model) was observed for a total of 81% of the water bodies in 2014. However, the area estimation for water bodies in 2020 was positive at 20%. Similarly, in the case of built-up areas, a large positive difference of 27% was observed for 2014, and a large negative difference (meaning that the estimated area decreased from the RF-50 Tree model to the RF-100 Tree model) of 21% was observed for 2020. The difference in the area estimations for water bodies and built-up land was significant for both 2014 and 2020. However, it would be interesting to observe the periodwise variation in the case of the other two land use types to ascertain the aforementioned phenomena for these two models.

Fig. 12
figure 12

Land use/land cover (LULC; spatial plot) thematic map for the different types of LULC in the Shrirampur region in Maharashtra; a built-up land, b agricultural land, c waste land and d water body data shown in 2014 using the RF-50 tree model; e built-up land, f agricultural land, g waste land, and h water body data shown in 2020 using the RF-50 tree model; i built-up land, j agricultural land, k waste land, and l water body data shown in 2014 using the RF-100 tree model; and m built-up land, n agricultural land, o waste land, and p water body data shown in 2020 using the RF-100 tree model

Fig. 13
figure 13

LUC typewise comparative analysis between RF-50 Tree and RF-100 Tree models for 2014 and 2020 for a agricultural land, b built-up land, c wasteland, and d water bodies

According to Fig. 13a, c, it can be inferred that the differences in the estimated area cover for wasteland and agricultural land using the RF-50 Tree and RF-100 Tree models, respectively, exhibited negative differences of 5% and 0.7% for 2014 and a mere 2% and 6% for 2020. In light of the evidence shown, this study revealed that both the RF-50 Tree and RF-100 Tree models are suitable for estimating the area covered by agricultural land and wasteland and thus can be employed for future land-use investigations in the present study area. However, when these models are subjected to estimating the areal cover under built-up spaces and water bodies, the RF-100 Tree model performed better than the RF-50 Tree model. The reason may be attributed to the LULC characteristics, as dominant land uses (agricultural land and wasteland) were estimated better by both models than was recessive land use (built-up and water bodies). In addition, the better fit of the RF-100 Tree model can also be attributed to its slightly better accuracy than that of the RF-50 Tree model. Since the RF-100 Tree model corresponded well with the realistic trend of a declining cover of water bodies coupled with an increasing cover of built-up land (against the findings of the RF-50 Tree model), this study suggested the application of the RF-100 Tree model for future investigations.

Advantages, limitations, and future research on the GEE and RF models

The present study adopted the RF-based classification algorithm for characterizing the long-term change in LULC in the corresponding study region, which has the following advantages over the existing approaches. The ability of RF to process large amounts of data upon excluding feature selection helps in balancing the classification error, thereby providing a greater degree of accuracy in classification. Since the RF-based methodology requires minimal manual intervention, the RF design becomes simpler and requires less effort from the modeler. The inherent potential of RF for the characterization of a variety of data assists in evaluating the importance of individual land use classes by estimating the generalization error, and the overall accuracy of classification has improved. The enhanced computational speed of the RF algorithm enhances the parallelization in the building of decision trees for classification and subsequently reduces the time required for classification. In general, the RF algorithm behaves more as an ensemble tree approach than does conventional approaches, viz., boosting and stacking; hence, the RF algorithm becomes overly sensitive to the quality of training data, and instability persists with slight alterations in the training samples. This investigation provided a better understanding of several important characteristics. The utilization of GEE has revolutionized the process of generating LULC maps in a highly efficient and organized manner, thereby surpassing the limitations of previous approaches [78]. In the past, generating LULC maps involved cumbersome tasks such as processing satellite data using local computers. This process was not only challenging but also inefficient, particularly when dealing with image processing tasks such as cloud removal, atmospheric correction, and mosaicking. However, with the introduction of GEE, these challenges have been effectively addressed. The GEE offers a cloud-based platform with powerful computational capabilities that streamline the generation of LULC maps. This platform facilitates seamless data processing, including automated cloud removal, atmospheric correction, and mosaicking. As a result, the entire workflow has become significantly more efficient, allowing for faster and more accurate LULC mapping.

Moreover, the motivation behind using RF-50 and RF-100 trees, despite the default mtree value being 500, is to explore the performance and accuracy of the RF algorithm with different numbers of trees in the context of this study. When the default value of mtree is set to 500, this does not necessarily imply that this is the optimal or most effective number of trees for all cases. By selecting RF-50 and RF-100 tree models, the study could investigate the impact of reducing or increasing the number of trees on the classification accuracy and performance of the RF algorithm in the specific application of LULC mapping. This approach allowed for a comparative analysis and determination of whether a lower or higher number of trees can yield improved results for the given study area and data. This approach thus provides valuable insights into the performance characteristics of the RF algorithm with varying tree numbers, surpassing the default setting of mtree.

As a complete large dataset cloud-based platform for resource combination, GEE is a satellite data processing and analysis platform that can aid in large storage and robust calculations [78]. Users can quickly access interactive satellite images and algorithms to conduct research-related scientific investigations and display spatial and nonspatial data [79]. The GEE platform is currently the best option for data collection and includes a set of geospatial datasets and satellite images from remote sensing satellites that cover more than 40 years, with some vector data also presented in the GEE platform [78]. While GEE has been used extensively to create maps of different types of land use and land cover [80], investigations of the estimation of soil qualities rarely employ it. As a result, additional research and development are required for the implementation of GEE in the random forest model. As a result, implementing GEE can significantly increase researcher productivity through the use of machine learning models [71], and GEE has been regularly used in scientific analysis in several domains at the local, regional, and international levels [80,81,82,83,84].

The present study acknowledges certain limitations that need to be addressed for a more robust understanding. First, the RF-50 and RF-100 tree classifier models were selected for the Landsat-8 satellite datasets, but it was observed that some bare soil and built-up land pixels exhibited similar spectral characteristics. This similarity resulted in mixed classes during the LULC mapping process, impacting the accuracy of the RF model [1]. Second, the absence of field data and object data points hindered the proper identification of LULC classes based on the RF models. Specifically, the water body and wasteland classes were not effectively classified using the RF-50 tree model on the GEE platform. It was observed that increasing the number of trees in the RF model could improve land use classification and change detection maps. Future studies could explore the integration of popular remote sensing indices, such as the normalized difference vegetation index (NDVI), normalized water index (NWI), and spectral vegetation index (SVI), to enhance the overall performance of image classification. Considering these limitations, future research endeavors will focus on developing and testing more comprehensive and suitable machine learning models based on GEE. This approach aims to incorporate precise land use and land cover mapping methodologies and image classification techniques to enhance accuracy. By leveraging the GEE cloud-based platform and minimizing the reliance on field information, the objective is to generate more precise LULC maps.

Conclusions

This study aimed to analyze the spatiotemporal variation in LULC classes over two distinct analysis periods. Additionally, the performance of the RF machine learning model-based classification algorithm was evaluated with 50 and 100 trees for accurately identifying individual LULC classes. While numerous machine learning models and classification tools are available, they often produce errors and incorrect maps within short periods. To address this issue, this study developed RF models with 50 and 100 trees using seven input bands as a variables used in the GEE platform. The primary objective was to determine which model performed better when applied to satellite data from 2014 to 2020. By comparing the performances of RF-100 on 2014 and 2020 satellite data and achieving similar accuracies, this model has demonstrated its ability to leverage time series datasets. This capacity allows for the creation of LULC maps spanning 30 years within a minimal timeframe. The findings of this study hold substantial significance for sustainable development, as they enable the efficient monitoring of LULC changes and their associated climate effects on the Earth's land surface. These results contribute to a comprehensive understanding of the dynamic relationships among land use, climate, and environmental sustainability. The following conclusions can be drawn from the present study:

  1. 1.

    While the classification accuracy of the RF-50 model demonstrated significant improvement throughout the entire analysis period, reaching 98%, it is important to critically examine specific findings within the results. One noteworthy observation is the seemingly disproportionate 2% increase in waterbodies over a six-year analysis period, which warrants further investigation and consideration. In addition, another notable observation is the lack of variation in the percentage of agricultural land throughout the analysis period. These findings raise questions regarding the stability of agricultural practices and land use patterns within the studied area during the designated timeframe.

  2. 2.

    The RF-100 classification algorithm exhibited exceptional performance during the analysis period, achieving an impressive classification accuracy of 99%. This algorithm effectively captured the transitions between wasteland and built-up land, providing valuable insights into the dynamics of these land cover classes. The study emphasized the robust performance of the RF-100 classification algorithm, as demonstrated by its high accuracy and ability to capture fine-scale transitions, underscoring its suitability for comprehensive LULC mapping applications. These findings provide a strong foundation for future research and potential implementation in large-scale mapping endeavors.

  3. 3.

    The proposed machine learning-based classification algorithms demonstrated improved performance compared to the existing classification approaches. However, slight disagreement was observed between the classifications of bare land and built-up land pixels. To address this issue, the present study urged the incorporation of commonly used remote sensing-based indices, such as the normalized difference vegetation index (NDVI), normalized water index (NWI), and spectral vegetation index (SVI), to enhance the accuracy and performance of image classification. This suggestion aligns with established practices in the field of remote sensing and machine learning, where the integration of spectral indices has proven effective in improving the performance of image classification algorithms. By leveraging these widely used indices, we can leverage additional information and improve the overall robustness and accuracy of our classification model. The utilization of the proposed 50 and 100 trees based on a machine learning model significant improvements in the identification and segregation of distinct LULC classes. Through rigorous evaluation, we assessed the accuracy of the RF-50 and 100 tree models and determined their effectiveness in accurately identifying LULC classes using the GEE platform and satellite data. This methodology has broader applicability beyond specific study areas, as it can be adopted in various global regions that encompass diverse LULC classes. By incorporating a greater number of training samples, a substantial enhancement in classification accuracy is anticipated. Increasing the training sample size enables machine learning models to learn more comprehensively from a diverse range of examples, leading to improved discrimination and classification of LULC classes.