Introduction

The demographic projections suggest that the Central and Southern Asia are poised to emerge as the world’s most populous region by 2037 [1]. Furthermore, India surpassed China to become the most populous country in the year 2023, and prevailing indications anticipate the persistence of this demographic trend for several decades [2]. The unrestrained expansion of built-up areas is majorly propelled by a substantial increase in population which ultimately leads to land use land cover (LULC) changes [3,4,5].

The significant characteristics of urban sprawl are a rapid decrease in vegetated areas [6, 7], random and unplanned growth [8, 9], increased economic activities in higher elevations [10,11,12], land cover change in agricultural areas [13,14,15,16], and increase in urban heat island [17,18,19]. This has created environmental, ecological, economic, and social challenges [8]. The changes, geographical and climatic, occurring in Himalayan cities call for special attention due to the geo-morphological, topographical, and seismic constraints [7, 10, 20, 21]. Thus, the monitoring of spatio-temporal expansion of the cities and accurate prediction of LULC change is vital for ecosystem conservation and sustainable development management strategies to be implemented in these regions [22]. As per the year-wise records shared by the Department of Economics and Statistics, State Government of Himachal Pradesh in India, the class III cities having a population of less than 50,000 in the state were found to be more vulnerable to urban sprawl due to saturation in capital city Shimla, and thus, there is a pressing need to balance economic development with sustainable environmental practices.

The integrated use of remote sensing and GIS has helped immensely in the management of land and natural resources and in understanding the complex linkages between spatial patterns and processes responsible for change [7, 23,24,25]. Thus, the modeling and accurate prediction of urban sprawl has been inviting the attention of various researchers [26, 27], and the use of modern self-learning algorithms has further improved the accuracy of these models [28,29,30,31]. The understanding of dynamic changes occurring in the region and the incorporation of driving factors also improves the accuracy of these models [26].

Cellular automata (CA)-based models are spatially explicit models (SEM) that work on a simple premise that the future state of a land cover type is dependent on the past local interactions between the different land covers [22, 26]. The model’s popularity in GIS grew immensely in the 1980s, catalyzed by pivotal contributions from Wolfarm [32], Michael Batty and Xie [33], and Batty et al. [34]. The accuracy of the model was dependent upon the temporal scale of maps, neighboring cells, and transition rules [35, 36]. Batty [34], Leao [37], and Lagarias [38] found them to be powerful spatial dynamic models. The open structure, simplicity, good spatial resolution, and integration with other knowledge-driven models make it an appropriate choice for urban sprawl studies [22, 26, 35, 39]. However, the model is dependent upon spatial data only and is limited in implementing driving forces which is important for complex processes and accurate simulation [22, 26]. The non-uniform cell space, dynamic neighborhood classes, and non-stationary transition rules offer opportunities for modification in the original CA structure to make it applicable for real-time complex urban sprawl studies [22, 35]. This makes it necessary to integrate CA with other models.

To address the inherent constraints in the individual models, various researchers have employed hybrid models like CA–Markov model [40] and CA-ANN model [41]. The integration of spatial patterns with the processes responsible for causing changes in landforms is imperative for the accurate prediction and modeling of land cover changes [24]. Artificial neural networks (ANN) can identify and analyze the complex inter-relationship between causative factors and complex patterns [26, 42]. The architecture of ANN simulates and behaves in a similar pattern as the human brain and nervous system [43,44,45]. ANN can deal with incomplete data, does not assume the distribution of input data, and can detect potential inter-dependencies between driving factors [46, 47]. Multi-layer perceptron (MLP)-ANN, consists of input layers, hidden layers, and an output layer, and is the widely used model in ANN because it is fast, accurate, and can infer and forecast outcomes derived from inputs that it has not encountered previously, exhibiting the capacity for extrapolation and prognostication [48]. Researchers have adeptly employed CA-ANN models to address spatial-dynamic complexities and driving factors, enhancing the robustness and realism of modeling for accurate prediction and estimation of land cover changes [18, 39, 42, 49, 50].

The study aims to model LULC change using MLP-ANN and cellular automation simulation in the city of Dharamshala, one of the fastest-growing cities in the state of Himachal Pradesh, India. The results are expected to act as a road map for urban planners and policymakers for sustainable development of the city. The research used the MOLUSCE plugin, as a tool to predict and assess the transformations occurring in each LULC type in the study area. In the study, LULC maps of 2016 and 2019 were used as independent variables in the model to simulate and validate the LULC map of 2022, and thereafter, LULC maps of 2025 and 2040 were predicted.

Study area

The research locale encompasses Dharamshala, situated in the state of Himachal Pradesh, India, as illustrated in Fig. 1. Positioned within the Western Himalayas, the city graces the southern inclines of the principal regional Dhauladhar mountain range (V. Gupta et al., [51]). Geographically, the study vicinity spans from 32° 9′ 52″ N to 32° 15′ 58″ N in latitude and 76° 17′ 22″ E to 76° 23′ 09″ E in longitude, encompassing an expanse of 42.7 km2. Elevation within this area exhibits variability, ranging from 790 m in the southwest to an altitude of 2130 m above mean sea level (AMSL) in the north. The region has a humid subtropical climate and experiences a mean annual temperature of about 19.1 ± 0.5 °C. The zenith of temperature occurs in June with an average of 32 °C, while the nadir registers in January with an average of 10 °C. The northern parts of the region also receive heavy snowfall during winter. Geologically, the region forms a part of the Outer Himalayas with a predominant geological composition comprising sandstone, characterized by alternating bands of clays, shale, and siltstones (V. Gupta et al., [51]).

Fig. 1
figure 1

Study area, Dharamshala city

The city is the winter capital of the state of Himachal Pradesh and the headquarters of the Central Tibetan Administration. The city is a famous hill station destination, both for national and international visitors. Further, it is also the administrative headquarters of Kangra district. The city was declared a municipal corporation in the year 2015 by merging 9 adjacent villages and has ever since witnessed rapid urbanization. It is one among the 100 cities in India and the only city in the state of Himachal Pradesh chosen in the year 2016 to be developed under the National Smart Cities Mission by the Government of India.

A dramatic rise in urban spaces has been witnessed in the city from the year 2016 onwards, and there exists an inherent imperative to address the recent alterations that have manifested within this geographical area through a scientific lens. The time scale chosen in the study corresponds to the maximum socio-economic changes occurring in the city due to the formation of municipal limits, hosting of international cricket matches and also serving as the residence of His Holiness Dalai Lama.

Methods

The simulation’s correctness is determined by the quality of the data and criteria used in the investigation [26, 35, 39]. The month of May is characterized by sunny days with no or little rainfall in the region; thus, all the temporal satellite imageries were chosen from this month to negate the impacts of phenological effects and cloudy pixels [52]. The ancillary data included a draft town and country planning (TCP) report of Dharamshala city and ground truth points (using GPS) for assistance and validation in image classification.

The study incorporated LULC maps of 2016, 2019, and 2022 and digital elevation model (DEM), the details of which are given in Table 1. Multi-temporal Landsat 8 Operational land Imager (OLI) satellite imageries for the years 2016, 2019, and 2022 were used, the description of which is shown in Table 2. A hybrid approach involving a Maximum Likelihood Classifier (MLC) and thereafter adopting post-classificaton improvement measures using vegetation indices was used in the research study to create LULC maps of 2016, 2019, and 2022 with each LULC map attaining an overall accuracy surpassing 85% and kappa hat showing substantial agreement. The selection of the Maximum Likelihood Classifier was based on the topographical challenges and spectrally homogeneous attributes of the land cover classes under investigation. The correction of the land cover classes through visual interpretation becomes essential by utilizing high-resolution satellite imagery obtained from Google Earth and Planet Scope [53, 54].

Table 1 Summary of datasets
Table 2 Description of satellite imageries used in the study (source: USGS Earth Explorer)

The riverine sources, in this part of the Himalayan region, are characterized by the presence of boulders and cobbles, and thus, the chances of overlapping spectral characteristics for the built-up areas and water bodies were likely. The Strahler order algorithm available in SAGA was used to accurately delineate the water bodies.

Various researchers have included slope, elevation, and aspect, as geospatial parameters; population density as the socio-economic parameter; and spatial variables such as distance from the water bodies, roads, built-up areas, and from the center of town for simulation [18, 30, 31, 39, 42, 49, 50]. After checking different combinations of socio-economic and physical factors, the simulated LULC map of 2022 showed the best performance by considering five parameters that included slope, distance from streams, distance from roads, distance from built-up areas, and distance from the center of town. The explanatory maps having the shp data format were converted to a raster and then Euclidean distance was calculated in QGIS to create a raster data type. The explanatory maps in GeoTIFF format were also created using Euclidean distance in QGIS.

The methodological workflow for the area under investigation is summarized in Fig. 2. The MOLUSCE plugin available in QGIS 2.18 was used for the simulation of land cover change in 2022.

Fig. 2
figure 2

Methodological workflow and data analysis

The transition probabilities derived from MLP-ANN learning processes are fed into CA to predict and estimate the LULC changes in this hybrid model of CA-ANN [31, 49].

Image pre-processing

The satellite imageries of 2016, 2019, and 2022 were transformed to spectral radiance values, and the Dark Object Subtraction (DOS) in the semi-automatic classification (SCP) plugin in QGIS was used for performing atmospheric correction. Thereafter, the images were mosaicked, and an image subset was performed using the shapefile of the municipal corporation limits of Dharamshala city. The shape file of municipal limits was geometrically corrected with the use of ground control points (GCP) selected using GPS. This was executed in a manner that ensured the Root mean Squared Error (RMSE) attained a value of less than half of a pixel [55].

Modified Anderson’s LULC classification system was adopted to produce thematic maps comprising five LULC classes, Protected areas (PA), Agricultural areas (AA), Built-up Areas (BA), Barren land (BL), and Water bodies (WB), as shown in Table 3, for the years 2016, 2019, and 2022. Supervised classification using MLC was used for the creation of the five land cover classes [7, 20, 53, 56, 57]. The forests are protected under Indian Forest Act, 1927, and the tea plantations are protected under Himachal Pradesh Ceiling on Land Holdings Act, 1972, and thus were classified under the protected areas (PA).

Table 3 Description of the different LULC categories

Inputs

The LULC maps for 2016 and 2019 are taken as input and establish the spatio-temporal dynamics of the region. The MOLUSCE plugin was used to create a transition map between 2016 and 2019 showing the percentage change occurring in each of the five land cover types for the period from 2016 to 2019.

For using the CA model, the region should be a discrete grided area, with each cell specifying a land cover type. The driving factors could be categorized as having different spatial attributes, such as distance parameters, physical properties, and neighborhood relationships [58]. The distance parameter includes distance from the streams, roads, built-up areas, and from the center of town. Physical properties include slope and elevation. Neighborhood relationships involve the percentage area of a land cover type around the cell of interest. The explanatory maps are extracted in a raster format (Fig. 3).

Fig. 3
figure 3

Explanatory map: slope, distance from streams, distance from roads, distance from built-up areas, distance from the center, and elevation

The transition functions are non-linear and represent the relationship between driving factors and transformation probabilities of land cover type [26, 39]. ANN model is trained on explanatory maps, and then the transition probabilities are established for the CA model. The prediction of transition probabilities from the current land use type to different LULC categories at the subsequent time point, denoted as “t + 1,” was determined by taking into account the current LULC classification of a specific cell as well as the neighboring cells at time t.

Based on spatio-temporal dynamics and the impact of driving factors, the simulation is initially performed for the year 2022, and based on the performance of the model, the predictions are thereafter made for the years 2025 and 2040 in the iterative steps of two and six, respectively, in the model.

Evaluating correlation and transition analysis

The examination of correlation among the driving factors was executed using the Cramer coefficient, also known as the Cramer V method, particularly suitable for contingency tables larger than 2 × 2. The outcomes span a range of 0 to 1, where elevated values signify a heightened correlation amid the driving factors. A coefficient surpassing 0.15 indicates a substantial explanatory potency of variables [49]. The correlation matrix is shown in Table 4.

Table 4 Driving factors with Cramer’s V

The changes (in area and percentage) occurring in the land cover classes for the period 2016 to 2019 are shown in Table 5. The transition matrix, shown in Table 6, helps compare and understand temporal transformations occurring in the region, without the impact of physical and socio-economic driving factors. Within the matrix’s diagonal, the constituent elements signify the magnitude of class constancy, portraying the persistence of specific land cover categories. Conversely, the off-diagonal entries encapsulate the dimensions of shifts occurring between distinct classes [18]. The values proximate to 1 are present in the diagonal entries, signifying the stability of the corresponding land cover types for the chosen period.

Table 5 LULC change from 2016 to 2019
Table 6 Transition probability matrix for LULC change from 2016 to 2019

Transition potential modeling

The transformations occurring in a region are a highly complex process dependent on spatio-temporal changes and driving factors responsible for the changes [26, 31]. The geographical phenomenon although non-linear and stochastic but have fractal properties [59] and machine learning algorithms, like MLP-ANN, can be very useful in the identification of these changes [45, 60]. The transition function pertaining to the alteration in LULC delineates the association linking the driving factors with the probabilities of conversion, specifically discerning whether cells will shift towards a particular land use/cover classification. The multi-layer feed-forward approach of the model is trained using the error back propagation, wherein the network parameters are modified as per the output error demands [48, 58, 61]. The learning curve for the ANN-MLP is shown in Fig. 4.

Fig. 4
figure 4

Neural network learning curve

Validation

In LULC simulation, the cross-tabulation matrix, also referred to as a contingency table, error matrix, or confusion matrix, stands as an extensively utilized approach for the evaluation of outcomes [62]. Cross-tabulation facilitates a comparative analysis between the outcomes projected by the model and the observed outcomes [63]. In this matrix, each row corresponds to the anticipated category, while each column signifies the factual category, thereby showcasing discrepancies in the cells, often expressed as errors represented in percentages or areas [27, 64].

The assessment of accuracy was conducted utilizing overall accuracy and kappa hat statistics as the metrics of evaluation. Both metrics use the confusion matrix for calculation purposes. The determination of overall accuracy involves the consideration of diagonal elements only within the confusion matrix, while the kappa hat also considers non-diagonal elements and thus incorporates omission and commission errors [64]. Kappa hat evaluates the land modeling performance excluding chance agreement [65], with values ranging from 0.41 to 0.60 categorized as “moderate agreement” and 0.61 to 0.80 as “substantial agreement” [27, 66].

Several simulations with different combinations of exploratory maps were performed, as shown in Table 7. The combination consisting of the parameters distance from built-up areas, distance from roads, distance from the center of town, elevation, slope, and distance from streams showed the maximum accuracy and was chosen in the research study to prognosticate the LULC for the year 2022. The simulated and actual maps were compared with the accuracy metric kappa having a value of 0.77 denoting a notable concordance between both the maps and accuracy was found to be 86.83%. It can be concluded from these that the explanatory variables chosen had a great influence on the prediction of LULC classes. The maps for the years 2025 and 2040 were predicted after running two and seven iterations in CA, respectively.

Table 7 Simulation results for different combinations of exploratory maps

Results and discussion

The LULC distribution for the years 2016, 2019, and 2022 is shown in Table 8. Table 9 shows the transition undergoing area-wise and percentage-wise for each LULC class from 2016 to 2019 and 2019 to 2022. The positive values show the increase in that land cover class, while the negative values indicate the decrease for a particular land cover class. The spatio-temporal distribution of LULC classes for the years 2016, 2019, and 2022 are shown in Fig. 5. It can be observed that protected areas had undergone the maximum transition from the year 2016 to 2022 with a reduction of 11.85% and a decrease of 5.04 km2 in area. The built-up areas had increased considerably by 14.54% and 6.18 km2 in area. The agricultural areas had also decreased by 2.73% and 1.16 km2 in area and a slight increase in barren land is also observed. This signifies the impact of anthropogenic and socio-economic activities in the city and the rapid conversion of this hill station into a concrete jungle. The results also indicate widespread encroachments and abeyance of legislation.

Table 8 LULC distribution for the years 2016, 2019, and 2022
Table 9 LULC change analysis for the years 2016, 2019, and 2022
Fig. 5
figure 5

LULC maps for the years 2016, 2019, and 2022

The increase in built-up areas and barren land for the period 2016–2022 is primarily related to the increasing human population and tourist inflow in the city, leading to additional need for residential and commercial spaces. This led to high pressure on the protected areas and agricultural areas, which had suffered maximum depreciation for this period.

The region lying at an altitude of less than 1500 m remained the most critical with maximum changes in LULC classes being witnessed there. The built-up areas, agricultural areas, and protected areas showed maximum transition in this region. The main reason for this could be attributed to the better transportation facilities, road connectivity, suitable climatic conditions for living and agricultural practices, commercial establishments, and more population concentration in this region. Higher altitude regions, because of terrain and other geographical constraints, are less vulnerable to built-up areas. Thus, the city requires greater concern and attention from policymakers and environmentalists to pave the way for a balanced, holistic, and sustainable development model.

The simulation and accurate prediction of LULC become necessary to understand the trend and direction of urban sprawl. The LULC maps of 2025 and 2040 were prepared using CA modeling, and the spatial distribution of these LULC maps is shown in Fig. 6. Six driving factors, distance from built-up areas, distance from roads, distance from the center of town, elevation, slope, and distance from streams, were chosen for the modeling.

Fig. 6
figure 6

Predicted LULC maps for the years 2025 and 2040

The LULC change analysis of the maps from 2016 to 2025 and 2016 to 2040 is shown in Tables 10 and 11. The results indicate the continuation of the trend of increase in the built-up areas and a decrease in protected areas for the year 2025. However, the increase in built-up areas will saturate after 2025, and the percentage increase in built-up areas for 3 years will be reduced as compared to the previous 3-year transition. This could be attributed to the fact that most of the usable and productive areas for construction will be exhausted.

Table 10 LULC change analysis from 2016 to 2025
Table 11 LULC change analysis from 2016 to 2040

The hilly areas offer geographical and topographical constraints for construction, and thus, the ideal locations for construction are usually those located at mid-altitudes and having less slope. The seismicity of the area is another challenge. All these factors will lead to construction in high seismic and landslide-prone areas, which would present a significant impediment to the well-being and security of the inhabitants. Another important observation from the findings was that the transition of built-up areas on the temporal scale is usually restricted to mid and south-eastern regions of the study area. The region has witnessed urban sprawl in these pockets and will remain a critical region in the future.

The swift expansion of urbanized regions, stemming from demographic expansion and the influx of tourists, emphasizes the critical significance of implementing sustainable urban planning strategies. Effective land-use management strategies should be implemented by policymakers and urban planners involving the promotion of efficient land use, reducing urban sprawl, and preserving green spaces, contributing to the attainment of Sustainable Development Goal (SDG) 11, which focuses on creating sustainable cities and communities.

The decline in protected areas is a matter of concern as it poses a threat to biodiversity and ecosystems. Strict implementation of legislation, with the involvement of environmentalists and policymakers, can help protect and restore these areas, thus preserving biodiversity and ensuring the long-term sustainability of natural resources. This effort directly relates to SDG 15, which focuses on maintaining and enhancing life on land.

Land-use planning plays a crucial role in fostering responsible consumption and production patterns. By optimizing land use and preventing further encroachment on protected areas, policymakers can contribute to sustainable resource management and reduce the environmental impact of human activities, which aligns with the objectives of SDG 12, aiming to ensure responsible consumption and production.

The increasing population and tourists will remain the major driving factors for the change. The decrease in agricultural areas indicates a shift in agriculture practice, which lately has been the preferred occupation of the residents. Further, the decrease in protected areas indicates the persistent encroachments and abeyance of legislation. In order to address the decreasing agricultural areas, it is crucial to promote sustainable farming practices and increase agricultural productivity to address the escalating requirements of sustenance. This can be accomplished through the implementation of innovative techniques, support for small-scale farmers, and ensuring food security for all, thereby working towards achieving Zero Hunger (SDG-2).

Conclusions

The study applied ANN-based CA approach for prediction of land cover classes which showed substantial agreement between the simulated and the actual LULC map, with the accuracy metric kappa showing a value of 0.77. The model incorporated six driving factors, out of which four were socio-economic spatial parameters, distance from built-up areas, roads, center of town, and streams; while two were geospatial parameters, elevation, and slope. These criteria combinations performed the best in the CA-ANN model showing the highest value of accuracy of 86.83%.

The selection of these factors was based on their potential influence on the study’s outcomes. For instance, proximity to built-up areas may impact pollution levels and development rates, while distance from roads may correlate with traffic noise and urbanization patterns. Elevation and slope could affect water resource accessibility, and proximity to streams might indicate water source quality.

The study predicts that the built-up areas will increase by 17.84% in the year 2025 and 19.69% by the year 2040. The protected areas will decrease by 14.75% and 16.66%, agricultural areas by 2.81% and 2.72%, and barren land by 0.29% and 0.31% for the years 2025 and 2040, respectively.

The rapid increase in population and tourism has led to a significant rise in built-up areas, creating an urgent demand for more land and putting undue pressure on protected areas and agricultural areas. Strict implementation of legislation is necessary to prevent further encroachments in the protected areas. Studying the critical land-use classes in terms of socio-ecological and environmental concerns is valuable for balancing environmental pressures and conservation interventions. The findings can offer guidance to administrators, policymakers, agricultural practitioners, and urban planners in formulating methodologies for sustainable land-use planning and management, fostering the optimal utilization of natural resources.