Introduction

In recent decades, the rapid rise in population and built-up areas has had detrimental effects on the natural environment across various scales (Herold et al. 2005). Anthropogenic activities linked to urbanization degrade natural landscapes, particularly forests and grasslands, resulting in environmental issues such as increased air pollution and rising surface temperatures (Bayraktar et al. 2010; Yang et al. 2014; Nasehi et al. 2019). Furthermore, uncontrolled urban expansion leads to urban sprawl, adversely impacting environmental, economic, and social factors crucial for sustainable development (Nijman and Wei 2020; Shao et al. 2021). In this context, accurate information on the current and future state of urban expansion is crucial for decision makers due to the many complex factors affecting sustainable development (Yasin et al. 2020; Shao et al. 2021). Therefore, in recent years, there has been a significant increase in interest in the analysis of Land Use and Land Cover (LULC) changes, as well as future LULC modelling. Many researchers have focused on studying urban expansion using various techniques, as demonstrated by the study of Hamad et al. (2018); Asadi et al. (2022); Kisamba and Li (2022); and Mahmoudzadeh et al. (2022).

Changes in LULC should be monitored regularly to analyse historical change trends and to demonstrate potential future urban expansion. Today, remote sensing techniques have proven highly effective in detecting changes caused by urban expansion. Over the past decades, this method, along with Geographical Information Systems (GIS), has been extensively applied in urban expansion research (Batisani and Yarnal 2009; Bagan and Yamagata 2012; Liu et al. 2022; Taiwo 2022). As high-resolution remote sensing imagery has become accessible, many researchers in the literature have used remotely sensed satellite imagery to map impervious surfaces (i.e., mostly artificial structure) (Yang and He 2017), assess the spatio-temporal pattern of urban expansion (Akintunde 2019; Zhong et al. 2020) and simulate future urban expansion (Aburas et al. 2017; Karimi et al. 2018). Talukder et al. (2022), used different supervised classifications, including random forest and machine learning algorithms, to map Landsat imagery from 2017 to 2020 to investigate LULC changes and rapid expansion of urban area in Tangail district of Bangladesh. Such findings on the expansion of urban areas are very important for urban planners to cope with rapid urbanisation locally.

One of the most important changes caused by LULC change and urban expansion is the increase of impervious surfaces and decrease of vegetation cover in the urban environment. Such landscape transformations also cause a significant change in Land Surface Temperature (LST). In a study done by Son and Thanh (2018), the impact of urban sprawl on the local temperature of Cantho city in Vietnam was investigated. Using Landsat data from 1996 to 2016 to map the built-up areas with Random Forest (RF), a ten-year assessment of the built-up area showed that the city has experienced significant expansion. In addition, it was shown that there was an increase of 0.8 °C in the average LST over the study period due to the conversion of open land to artificial surfaces due to urban expansion.

Ullah et al. (2020) analysed the effects of urban expansion on LST and vegetation cover in Xi’an, China, covering three decades (1987–2018). For this purpose, potential driving factors were analysed, and the Normalised Difference Vegetation Index (NDVI) was used. The findings revealed a significant increase of 69.6% in urban areas over the last 30 years, accompanied by a decrease in vegetation cover from 2002 to 2018. By analysing the correlation between LST, urban area and NDVI, the study highlighted the impact of urban expansion on LST and concluded that vegetation cover can mitigate the negative effects of increasing temperature to some extent.

Understanding, modelling, and projecting land dynamics at both the regional and global scales is essential due to the significant effects of LULC change and urban expansion (Islam and Ahmed 2011). In recent years, researchers have used different models to investigate future LULC changes, providing valuable insights for urban planners and policymakers to foster sustainable urban development (Hyandye and Martz 2017). Among these modelling techniques, Cellular Automata (CA) and Markov Chain (MC) have gained popularity (Ghosh et al. 2017). CA models are preferred for their ability to integrate spatial and temporal aspects of processes while MC models analyse the likelihood of land cover transformations (Santé et al. 2010; Moghadam and Helbich 2013). The MC model predicts future changes by analysing past trends, whereas CA focuses on the spatial characteristics of changes. Consequently, the CA-Markov model, integrating the strengths of both models, is considered a more efficient and effective tool for simulating such changes. Moreover, the successful integration of the CA-Markov model with remote sensing and GIS has led to its widespread use in geospatial environmental modelling within Earth sciences (Ghosh et al. 2017).

Numerous studies have employed the CA-Markov model to assess and predict future LULC changes and urban expansion (Karimi et al. 2018; Novin and Ebrahimipour 2019; Mohamed and Worku 2020; Getu and Bhat 2022; Kumar et al. 2023). For instance, Hyandye and Martz (2017) applied the CA-Markov model in Tanzania’s Usangu Catchment, projecting an 8.2% increase in urban areas by 2020. They also observed a significant decrease in forestland and shrubs by 20% and 7%, respectively. Similarly, Kumar et al. (2023) addressed the rapid expansion of urban areas, predicting a 55% growth in urban areas by 2025, leading to substantial forest and vegetation loss. Aburas et al. (2021a) conducted a spatio-temporal simulation of Seremban’s future urban expansion in Malaysia. By simulating the 2010 map using LULC maps from 1990 to 2000, they were able to predict urban and non-urban area with an accuracy rate of 83%. The CA-Markov model was then used to forecast urban expansion for 2020 and 2030, demonstrating the applicability of the model in a variety of urban settings, particularly in developing countries. In research carried out by Kisamba and Li (2022), the urban expansion of Dodoma urban district in Tanzania was simulated using CA–Markov model. The LULC change analysis from 2005 to 2020 revealed a significant transformation of bare land and vegetation into built-up areas. The 2020 simulation was validated with the actual map of the same year, and projections for 2030 indicate a continued increase in built-up areas.

While numerous studies in the literature predominantly focus on urban expansion at the city level, there is a notable lack of district-based analyses, particularly for dynamic megacities. Undertaking local-scale investigations at the district level offers a unique perspective, providing profound insights into the causes and consequences of urban expansion. These local analyses, in turn, provide valuable information that contributes to broader regional assessments at the city level (Habibi and Asadi 2011; Azabdaftari 2022; Azabdaftari and Sunar 2022).This lack of studies at the local scale, combined with the necessity to comprehend the complexities of megacity dynamics, is the main driving force behind our research. Our study aims to address this gap by using a prediction model specifically designed for urban expansion in the highly urbanized districts of Istanbul, Türkiye, and Sydney, Australia. Furthermore, we aim to demonstrate how modelling urban expansion at the district level can provide detailed insights, aiding urban planners and decision-makers in evaluating LULC Changes and their associated environmental impacts with greater detail and precision.

Study area

Istanbul, the largest and most populous city in Türkiye, has become the epicentre of urbanization in the country. The increase in population has created a growing demand for housing, driving urban expansion, especially in outlying districts situated away from the city center. The study area selected for this research is the Arnavutköy district, located at 41.1946 ° North Longitude and 28.7532 ° East Latitude (Fig. 1). The selection of Arnavutköy is motivated by the establishment of ‘Istanbul Airport’ and the planned ‘Project Canal Istanbul (PCI)’ (Azabdaftari and Sunar 2022). These initiatives are expected to significantly impact the area’s population. Therefore, it is crucial to carefully examine how urban planning will change as the region becomes more attractive (Akın et al. 2015; Georgieva et al. 2015; Akyürek et al. 2018; Avcı and Sunar 2018).

Sydney, the state capital of New South Wales (NSW), is the most populous city in Australia, home to an estimated 5 million people. According to the most recent statistics from the Planning and Environment Department, Sydney will likely experience urban expansion (NSW, 2023). Due to Sydney’s proximity to the ocean in the east and the Blue Mountains in the west, this urban expansion is anticipated to occur in the north-western direction. Urban planners regard the Hills Shire Local Government Area (LGA), which is positioned at 33.6049° South Longitude and 150.9773° East Latitude, as a district with substantial potential for expansion. The motivation behind selecting this district is mainly driven by the ongoing large-scale construction of residential and commercial buildings in the region (Fig. 2). Additionally, the transportation hubs that offer quick access to Sydney’s centre are what make this area more attractive (Azabdaftari and Sunar 2022).

Fig. 1
figure 1

The map of (a) Türkiye, (b) Istanbul, and (c) satellite image of Arnavutköy district

Fig. 2
figure 2

The map of (a) Australia, (b) Sydney NSW, and (c) satellite image of Hills Shire LGA

Data sources

In remote sensing studies, a common approach involves integrating satellite imagery with ancillary data to ensure comprehensive and accurate analyses. As a satellite data, six cloudless Landsat 5 and Landsat 8 satellite images from 1997, 2007, and 2017 were utilized for the Arnavutköy district, while for Hills Shire LGA, images from 1996, 2007, and 2018 were employed. Additional details about the satellite data can be found in previous studies by Azabdaftari and Sunar (2022). In addition to the satellite data, multiple ancillary datasets, as outlined in Table 1, including the land use map, administrative boundaries, and others, were used in the analysis.

Table 1 Ancillary data used in the analysis

Methodology

Figure 3 shows the methodology used for modelling urban expansion in both districts selected in this study and will be described in more detail in the following sections. However, before performing the main image processing steps, pre-processing is necessary to address potential errors. These errors, which may affect the geometry or brightness values of the image, are usually introduced when data is captured by satellite sensors (Richards 1999). In this context, radiometric and geometric corrections have been applied as the first stage of image analysis to improve the quality and accuracy of satellite images, as reported in previous research (Azabdaftari and Sunar 2022).

Fig. 3
figure 3

Flowchart of the methodology used in this study

Image classification and accuracy assessment

Image classification extracts LULC information from images, creating thematic maps by identifying distinct land covers through unique spectral signatures (Alshari and Gawali 2021). This study utilized the Maximum Likelihood Classification (MLC), a widely employed supervised technique known for its accuracy between 80% and 90% (Bharath et al. 2017; Roy and Kasemi 2021). MLC assigns pixels to classes based on maximum likelihood statistics (Richards 1999). Applying MLC, LULC maps for 1997, 2007, and 2017 in Arnavutköy and 1996, 2007, and 2018 in Hills Shire were generated. In Arnavutköy, identified classes include built-up area, agricultural area, forest, barren land, and wetland. Likewise, Hills Shire exhibits the five classes: built-up area, agricultural area, forest, quarry, and water bodies (Azabdaftari and Sunar 2022).

Accuracy assessment, a critical post-classification step, compares classified images with reference data. In Arnavutköy, overall accuracy rates were 91%, 86%, and 88% for 1997, 2007, and 2017, respectively. Similarly, Hills Shire LGA showed rates of 92%, 95%, and 91% for 1996, 2007, and 2018, respectively (Azabdaftari and Sunar 2022).

Change detection

The aim of change detection analysis, which is the process of monitoring the changes that occur in a given area over time, is to understand the characteristics, rate, spatial pattern and underlying causes of these changes (Sunar et al. 2017). The widely employed post-classification comparison method in this context involves analyzing two classified images. To assess changes in LULC classes for the Arnavutköy district (1997–2017) and the Hills Shire LGA (1996–2018), transition matrices were calculated by comparing thematic maps from different years within the study period (Azabdaftari and Sunar 2022).

Land surface temperature

The Land Surface Temperature (LST) was retrieved using the land surface temperature algorithm, as detailed in a prior article (Azabdaftari and Sunar 2022). Additionally, for a more comprehensive insight into the impact of urban expansion on the LST of the study areas over time, the Normalized Difference Vegetation Index (NDVI) was computed using the following formula.

$$NDVI\; = \;(NIR\; - \,R)\;/\;(NIR\; + \;R)$$
(1)

where NIR is the near-infrared band, and R is the red band. These indices have characteristic values between − 1 and 1 (Morshed et al. 2020).

Urban modelling

Urban modeling plays a pivotal role in urban planning, offering insights into city evolution over time (Batty 1976, 2013; Epstein 2008; Crooks and Heppenstall 2012). Simulations become necessary to observe urban systems as cities progress, with GIS integration revolutionizing the field as a key tool for analyzing changes (Su 1998).

Various modeling methods are widely used in the literature for future prediction, including CA, MC, and Artificial Neural Network (ANN). Each method has its own pros and cons; however, integrated models have consistently shown higher accuracy. Specifically, the CA-Markov model is the most commonly used approach among integrated models in the literature and is also employed in this study (Guan et al. 2011; Al-sharif and Pradhan 2014; Durmusoglu and Tanriover 2017; Hamad et al. 2018; Aburas et al. 2021).

The CA-Markov model combines the spatial analysis capabilities of Cellular Automata with the statistical strength of Markov chains, offering a robust tool for urban expansion prediction (Abdelkarim 2023; Roushangar et al. 2023). This model is particularly effective at managing spatial interactions and incorporating probabilistic transitions, making it superior for detailed, district-level planning. In contrast, employing only Markov chains lacks spatial considerations as they process transitions independently of location, while Cellular Automata alone do not include probabilistic components, potentially disregarding important patterns in land use changes (Lihawa et al. 2022). The ANN models, although capable of handling complex patterns, face challenges like high complexity and computational demands, making them less suitable for large-scale urban planning. A comparative study by Asadi et al. (2022), demonstrated that while the ANN models were slightly more precise in smaller, fragmented regions such as soil and rocks, the CA-Markov model was more effective at predicting larger, continuous urban areas. Hence, the CA-Markov model is preferred for urban expansion modelling due to its effectiveness in urban contexts.

Further, the accuracy of CA-Markov model in urban simulation is well-documented. For instance, Aburas et al. (2021) reported an 83% accuracy rate using the CA-Markov model in Seremban, Malaysia. Kisamba and Li (2022) achieved a Kappa index over 0.75 in Dodoma, Tanzania. Aliani et al. (2019) in Thales County also highlighted the model’s proficiency in urban land use change prediction, with an overall simulation success rate estimated at 82.2%. These studies underscore the CA-Markov model’s effectiveness in accurately simulating urban growth and land-use changes.

Cellular automata

Cellular Automata is a discrete and dynamic system where local interactions between components lead to global changes in space and time (Batty 1976; Wolfram 2002; Yacoubi 2008).

A CA is also defined as quadruple;

$$A =\left(\mathcal{L},\mathcal{ }S, N, f\right)$$
(2)

where; \(\mathcal{L}\) represent a d-dimensional arrangement of cells, organized according to their spatial dimensions and cell shape, N defines the cell’s neighbourhood, f is transition function that computes the state of cell at time t + 1 based on the state of neighbourhood at time t.

In CA, either discrete or continuous, cells serve as storage units with various state variables. These variables are simultaneously updated at each time step according to “transition rules,” considering the neighboring cells’ values from the previous step. This rule applies uniformly to all grid cells, and the new state of each cell is stored in a separate grid to facilitate the transition process (Wolfram 1984; Chopard and Droz 1998; Breckling et al. 2011).

As stated in the study of Czaran and Bartha (1992), transition rules can take on either a stochastic or predictable nature, as expressed in the following equation:

$${a}_{t+1}^{s}=f\left({a}_{t}^{s-1},\dots ,{a}_{t}^{s},\dots ,{a}_{t}^{s+r}\right)$$
(3)

where, \({a}_{t}^{s}\) is the state of cell s at time t, r is the neighbourhood range of the cell and f is the transition function that represents the transition rules.

CA accurately models dynamic and spatial processes as it focuses on how cells interact with changing features in time and space. Furthermore, its compatibility with GIS leads researchers to utilize CA in LULC simulations, yielding realistic spatial structure models (White and Engelen 2000; Ye and Bai 2008; Sang et al. 2011).

Markov chain model

A Markov Chain or Markov process is a mathematical model used to describe the progression of a system from one state to another. It predicts sequences of events where the probability of each event is influenced by the previous event (Muller and Middleton 1994; Seneta 2006).

In a Markov sequence consisting of X0, X1, X2,…,Xm random variables in a state space (S1, S2,…, Sn), the Markov Chain can be expressed as:

$$P({X_n}\; = \;{S_n}\;\left| {{X_{n - 1}}\; = \;{S_{n - 1}})\; = \;P({X_n}\; = \;{S_n}} \right.$$
$$\;\left| {{X_0}\; = \;{S_0},{X_1}\; = {S_1},\;...\;,} \right.{X_{n - 1}}\; = \;{S_{n - 1}})$$
(4)

Markov Chain analysis generates probability maps and a matrix known as the transition probability matrix, containing details about the probabilities of transitioning from one state to another (Iacono et al. 2015; Hamad et al. 2018).

A transition matrix is a tool that details the probabilities of transitioning between states. If Pt is considered as transition matrix of Markov Chain at time t, with the rows and columns of the matrix in the state space S (Ye and Bai 2008), the (i, j)th element of the matrix Pt can be expressed as:

$${ \left({P}_{t}\right)}_{i,j}=P({X}_{t+1}=j|{X}_{t}=i)$$
(5)

Equation 5 indicates that each row of the matrix is a probability vector and the sum of its inputs is equal to 1. In conclusion, Markov Chain analysis analysis offers insights into the probability of state changes by creating probability maps and this transition matrix.

CA-Markov model

The CA-Markov model seamlessly integrates CA and MC modules, enhancing its capabilities beyond the individual strengths of each model. This integration effectively addresses the limitations associated with using a single model, as it combines the dynamic simulation features of CA with the statistical approach of MC. Consequently, the integrated CA-Markov model is widely used as a more robust and efficient technique for simulating various scenarios (Alkheder et al. 2006; Guan et al. 2011; Akin et al. 2014; Durmusoglu and Tanriover 2017; Hyandye and Martz 2017; Roushangar et al. 2023).

Prediction of future LULC dynamics

To predict future LULC changes and urban expansion in any region using CA-Markov modelling, four key inputs are required: (i) LULC maps (generated from satellite image classification), (ii) suitability maps (created through Multi Criteria Evaluation (MCE)), (iii) transition area matrix (developed using the MC model), and (iv) a contiguity filter (the standard 5 × 5 filter).

Multi criteria evaluation

MCE is a decision-making methodology that evaluates alternatives based on multiple criteria to determine the optimal choice (Kilic and Kaya 2015). In the context of CA-Markov modelling for suitability maps in urban planning and development analysis (Eastman 2012; Hyandye and Martz 2017), the MCE process involves four sequential steps:

  • Criteria Identification and Development: Various factors and constraints are combined to create a suitability map, with constraints limiting development in specific areas, and factors influencing suitability through their appropriateness (Eastman 2006; Drobne and Lisec 2009).

  • Criteria Standardization: Different factors, each with unique scales and units, are standardized using methods like fuzzy functions, translating criteria into a common scale (0–1 or 0-255) indicative of suitability levels (Alkema et al. 2014).

  • Criteria Weighting: After standardization, weights are assigned to each criterion to determine its relative importance in comparison to others, with available methods including equal weights, user-defined weights, and the Analytic Hierarchy Process (AHP); for this study, the user-defined method was employed.

  • Weighted Criteria Combination: Using the Weighted Linear Combination (WLC) method, this step aggregates the standardized, weighted factors, multiplying each criterion’s value by its weight and summing these to create a composite map (Eastman 2003).

Model validation

To assess the model’s simulation capability, the simulated map is compared with a reference map using K-indices (Kno, KLocation, KLocationStrata, and KStandard), assessing overall and location-specific accuracy (Pontius Jr and Millones 2011). Kappa values, ranging from 0 to 1, indicate similarity, with values below 0.4 suggesting low accuracy, 0.4 to 0.75 indicating moderate accuracy, and above 0.75 representing high agreement (Pontius Jr and Millones 2011; Wang et al. 2011; Hamad et al. 2018; Qiu and Lu 2018). This study set a Kappa value of at least 0.75 as the acceptable level of agreement.

Results and discussion

Temporal trends in LST

In the analysis, temporal trends in LST for two distinct regions – the Arnavutköy district and the Hills Shire LGA were derived, providing a clear depiction of how LST varies over time in each area. In addition, the distribution of LST ranges as a percentage of the total study area was calculated for both regions (Figs. 4 and 5).

Fig. 4
figure 4

Spatial distributions of LST for the Arnavutköy district in (a) 1997, (b) 2007, and (c)

Fig. 5
figure 5

Spatial distributions of LST for Hills Shire LGA in (a) 1996, (b) 2007, and (c) 2018

The LST findings for the Arnavutköy district, as depicted in Fig. 4, reveal a remarkable temporal trend over the 20-year period. In 1997, the majority of the region (68%) exhibited an LST below 25°C, with residential areas ranging between 26 and 28°C. By 2007, a significant temperature increase is observed, particularly in the higher temperature ranges (29–31°C). The highest LST values observed in the northern parts of the region, particularly in 2017, are likely attributed to the construction of Istanbul Airport and the associated building development, contributing to the overall temperature increase in the region. This discernible change is evident in the LST range higher than 32°C, which escalated from only 0.1% in 1997 to 29% in 2007 and further to 50% in 2017. As of 2017, this proportion covers more than half of the study area, indicating a substantial warming trend in the region.

In the Hills Shire LGA, illustrated in Fig. 5, a substantial LST increase occurred over the study period. In 1996, 62% of settlements experienced LST values within 24–28°C, diminishing notably to 42% by 2007. By 2018, a shift was evident, with 52% recording LSTs in the 29–33°C range and 45% exceeding 34°C. This shift indicates a noteworthy distribution change toward higher temperatures. In conclusion, the analyses affirm a significant rise in regions with the highest temperatures over the twenty-year period, notably in the > 34°C range.

To better understand the impact of urban expansion over the study period, mean LST and mean NDVI were calculated for both built-up and non-built-up areas in each study area. The results for Arnavutköy district and Hills Shire LGA are presented in Fig. 6.

Fig. 6
figure 6

LST and NDVI values for built-up and non-built-up areas in both study areas

Figure 6 illustrates that, in Arnavutköy district across all measured years, built-up areas consistently displayed higher mean LST values compared to non-built-up areas. Additionally, the mean NDVI in built-up areas declined from 0.10 in 1997 to 0.07 in 2007, indicating a transformation from vegetated to non-vegetated areas during this period. Notably, the reduction in the mean NDVI value in 2007 was also evident in non-built-up areas, potentially attributed to the formation of small wetlands due to mining activities. In 2017, a significant decline in the vegetation class was observed, likely attributed to the rapid construction of the Istanbul Airport (Akyürek et al. 2018).

Moreover, Fig. 6 clearly demonstrates a gradual increase in the mean LST trend in both built-up and non-built-up areas of the Hills Shire LGA. Meanwhile, the mean NDVI values have declined from 1996 to 2018, although this change trend is not as abrupt or sharp as observed in the Arnavutköy district. The observed temporal trend changes in both districts confirm that urban expansion, resulting in a reduction of vegetated areas, had a substantial impact on LST values throughout the study period.

Prediction of future LULC dynamics

To predict future land use dynamics and urban expansion using CA-Markov modelling, essential inputs include LULC maps from satellite image classification, suitability maps via MCE, a transition area matrix from the MC model, and a 5 × 5 contiguity filter. The modelling process conducted using the IDRISI Selva software, which is specifically designed for spatial analysis.

LULC maps

The LULC maps were generated for 1997, 2007, and 2017 in Arnavutköy and 1996, 2007, and 2018 in Hills Shire using an ML classification algorithm, with detailed results reported in a previous study by Azabdaftari and Sunar (2022).

Suitability maps via multi criteria evaluation

Arnavutköy’s suitability maps were created to simulate the 2017 LULC map using data from 1997 to 2007, while Hills Shire LGA’s suitability maps were generated to simulate the 2018 LULC map based on data from 1996 to 2007. In generation of LULC suitability maps, various factors influencing changes across all LULC classes were considered with specific attention to constraints, particularly in built-up areas, within the framework of urban expansion. In this study, MCE analysis was conducted on the predominant LULC classes—built-up, agricultural, and forest areas—which collectively constitute the majority of both study areas.

Urban development is influenced by several factors, including proximity to existing built-up areas and transportation networks, as well as topographical conditions such as such as terrain characteristics. Data from 2007 indicate that areas closer to existing infrastructure are more suitable for development, while regions with high elevations and steep slopes are less suitable due to the challenges they pose for construction and infrastructure.

Additionally, soil maps, highlighting different soil types, play a crucial role in determining agricultural land suitability. These maps provide vital information on the dominant soil types and their properties, which help assess land appropriateness for farming, planning, and environmental protection. The suitability for agriculture diminishes as the slope increases, thus impacting the potential for agricultural development.

Proximity to existing forests is a key factor determining forest suitability, as areas near existing forests are more likely to develop into forested areas. This relationship, along with elevation and slope, varies between locations; higher elevations decrease forest potential in Arnavutköy but increase it in Hills Shire. Initially, slope was considered a critical factor in the forest suitability analysis for both locations, but iterative MCE analysis revealed it was only significant for Arnavutköy, leading to its exclusion as a factor in Hills Shire LGA. All the factors considered for both districts are listed in Tables 2 and 3.

Besides the factors, specific constraints restricting urban expansion were chosen for Arnavutköy district (existing built-up area, protected areas, watershed protection areas, and wetlands) and Hills Shire LGA (existing built-up area, protected areas, and watershed protection areas).

After identifying the factors, the next step involved standardization. The factors were converted to a continuous scale (0 to 255) using fuzzy functions and control points that determine the fuzzy functions as monotonically increasing, decreasing or symmetrical as detailed in Tables 2 and 3. Most factors were standardized using monotonically increasing or decreasing sigmoidal functions, which are known for their gradual transitions and are commonly employed in fuzzy set theory. Examining historical change trends, each factor was iteratively weighted to determine its significance. Through repeated testing, optimal weights were determined, as outlined in Table 4. Notably, standardization and weighting applied to factors, excluding constraints represented as a Boolean map (0 and 1).

Table 2 Standardization of selected factors for Arnavutköy district
Table 3 Standardization of selected factors for Hills Shire LGA
Table 4 User-defined weights for both study areas

Suitability maps for the minor LULC classes were generated by marking areas where they were present as suitable and areas where they were absent as unsuitable. These suitability maps were then standardized using an increasing sigmoidal fuzzy function. In Arnavutköy, barren land (3.73%) and wetland (0.78%) were identified as most suitable, standardized with an increasing sigmoidal fuzzy function. Likewise, in Hills Shire LGA, quarry (0.22%) and water (1.20%), with smaller coverage, were similarly identified and standardized.

Transition area matrix from the MC model

To create transition area matrices depicting expected changes between categories within each study area, Markov Chain analysis was conducted using two images (an initial t-1 image and a subsequent t image). For the 2017 LULC modelling in Arnavutköy district, the 1997 LULC map served as the initial image (t-1), and the 2007 LULC map served as the subsequent image (t). Similarly, for the 2018 LULC modelling in Hills Shire LGA, the 1996 LULC map was used as the first image (t-1), and the 2007 LULC map was used as the second image (t). Tables 5 and 6 present the transition area matrices for each study area, respectively. In the transition area matrix, off-diagonal elements indicate the pixels changing for each class, while diagonal elements show pixels remaining constant.

Table 5 Transition area matrix of Arnavutköy district (1997–2007)
Table 6 Transition area matrix of Hills Shire LGA (1996–2007)

In Table 5, the largest transformation in Arnavutköy district was from forest to agricultural area, involving 34,261 pixels. The most significant change to built-up area was from agricultural land, accounting for 26,934 pixels between 1997 and 2007. According to Table 6, in Hills Shire LGA, the greatest change, involving 21,560 pixels, was from forest to built-up area between 1996 and 2007, with agricultural areas also experiencing significant transformation into built-up land.

Contiguity filtering

In the final step, a 5 × 5 standard contiguity filter was used in the CA-Markov model to define neighbourhoods for each land cover class cell. Proximity to the current land use category assigned higher suitability to nearby pixels. For distant pixels in each LULC class, suitability was reduced, prioritizing neighbouring areas.

Simulated LULC maps by CA-Markov model

By integrating all input data acquired in preceding processing steps, LULC maps were simulated using the CA-Markov model. Figures 7 and 8 show the simulated LULC maps for Arnavutköy district in 2017 and Hills Shire LGA in 2018. Additionally, area statistics for both the actual and simulated maps of Arnavutköy district and Hills Shire LGA were computed, aiding in the evaluation of simulation accuracy and the detection of any changes. The outcomes of this analysis are presented in Tables 7 and 8.

Fig. 7
figure 7

LULC maps of Arnavutköy district for 2017. (a) simulated. (b) Actual

Table 7 Statistics of actual and simulated LULC maps for Arnavutköy district in 2017
Fig. 8
figure 8

LULC maps of Hills Shire LGA for 2018 (a) simulated. (b) Actual

Table 8 Statistics of actual and simulated LULC maps for Hills Shire LGA in 2018

The findings presented in Fig. 7; Table 7 indicate that the model is generally effective in capturing the spatial arrangement and distribution of most LULC categories, such as agricultural areas and forests. However, minor discrepancies in the area sizes between the simulated and actual LULC maps, particularly an underestimation in the built-up class, are observed. A significant factor contributing to this discrepancy is the construction of Istanbul Airport, initiated in 2015. Since the CA-Markov model bases its simulations on transition matrices from earlier (1997) and later (2007) dates, it could not anticipate specific changes, such as the airport construction in 2015.

Conversely, as seen in Fig. 8, the simulated LULC map for Hills Shire LGA, showing the spatial distribution of LULC categories, aligns consistently with the actual LULC map. The statistics presented in Table 8 further validate this alignment, indicating that the area sizes of each class in the simulated LULC map of 2018 closely match those in the actual map, with minor deviations considered acceptable.

Model validation

To ensure the accuracy and reliability of predictions generated through simulations, model validation was undertaken. This involved comparing the simulated 2017 LULC map of Arnavutköy district with the actual 2017 map, and similarly, evaluating the simulated 2018 map of Hills Shire LGA against the actual 2018 map. Model validation results for both areas are detailed in Table 9.

Table 9 Model validation analysis

The validation results, as presented in Table 9 and ranging from 0.81 to 0.88, demonstrate a strong agreement across all validation factors for both the Arnavutköy district and Hills Shire LGA. Although slightly better in Hills Shire, overall, the results show the remarkable robustness of CA-Markov model predictions.

Projection of LULC map for the year 2050

In order to project the LULC map for the year 2050 for both study areas, the same process was applied as in simulating the 2017 and 2018 LULC maps. In this context, in the first step, suitability maps for each LULC class were generated using the weights derived from simulations of the 2017 Arnavutköy district map and the 2018 Hills Shire LGA map. In the next step, the Markov Chain transition area matrix (i.e., 2007–2017 transition area matrix for Arnavutköy district and 2007–2018 transition area matrix for Hills Shire LGA) was calculated for each area to create the 2050 LULUC map as shown in Fig. 9.

Fig. 9
figure 9

Projected LULC map for the year 2050 (a) Arnavutköy district and (b) Hills Shire LGA

To assess future LULC changes and urban expansion, a comparison was made between the area statistics of the 2017 map for Arnavutköy district (Table 10) and the 2018 map for Hills Shire LGA (Table 11) with those of the projected 2050 LULC map.

Table 10 Arnavutköy district LULC statistics (2017 and projected for the year 2050)

Table 10 shows a projected 44% growth in the built-up area, expanding from 9,836 hectares in 2017 to 14,260 hectares in 2050. The primary expansion is expected towards the north, near the Istanbul Airport, and in areas close to major road networks. Urban expansion is also foreseen near existing built-up regions, aligning with a disaggregated pattern observed in similar trends of urban expansion reported by Aburas et al. (2021). The 2050 LULC model predicts minimal changes in agricultural areas over 33 years, while forest areas are expected to decline by approximately 26%. Additionally, there is a projected decrease in wetlands (8%) and barren land (19%). The substantial loss of forest, partly due to the construction of Istanbul Airport, is likely to significantly influence urban expansion patterns and serve as a major factor contributing to overall forest degradation.

Table 11 Hills Shire LGA LULC statistics (2018 and projected for the year 2050)

Table 11 indicates significant changes in the urban land use of Hills Shire LGA. The built-up area is projected to increase by approximately 51%, expanding northward in line with future development and transport network improvements. Additionally, urban expansion is expected to primarily occur in already developed areas. The forested regions are projected to decrease significantly, with a loss of about 6,000 hectares between 2018 and 2050, equating to a 30% reduction over 32 years. Quarry and water areas are expected to remain relatively stable, while the agricultural area is anticipated to follow a trend similar to that of the Arnavutköy district, with no significant changes.

Discussion

Efficiency of the integrated CA-Markov model

The integrated CA Markov model, applied in this study for predicting urban expansion in Arnavutköy district and Hills Shire LGA, aligns with the findings of research conducted by Aburas et al. 2021); Karimi et al. (2018); Hamad et al. (2018). As shown, the CA-Markov model is highly capable of predicting LULC change, but its inherent limitations need to be recognised and addressed. The model relies on static transition probabilities over time, potentially overlooking dynamic and unpredictable factors influencing land use dynamics. Additionally, the model may not accurately capture the complexity of real-world processes affecting urban expansion. To enhance predictive capabilities, insights from studies by Asif et al. (2023) and Theres et al. (2023) underline the importance of incorporating additional factors and utilizing a combination of different models. For instance, Gharaibeh et al. (2020) demonstrated substantial improvement in simulating and predicting land transformation by integrating the CA-Markov model with an ANN. This hybrid approach achieved an impressive accuracy of 90.04%, outperforming the standalone CA-Markov model, which achieved an accuracy of 86.29%. Similarly, Aburas et al. (2017) employed diverse methods, including Analytical Hierarchy Process (AHP) and Frequency Ratio (FR), to enhance the CA-Markov model for predicting urban expansion. The findings revealed that the CA-Markov model based on Frequency Ratio outperformed both the traditional CA-Markov and AHP-based CA-Markov models.

It is crucial to note that diverse factors such as image quality, processing methods, data types, and the selection of influencing factors affecting urban expansion trends have a substantial impact on algorithm efficiency and simulation accuracy, as highlighted by Aburas et al. (2017).

Socio-economic dynamics and urbanization

Economic, social, and political factors, particularly migration to megacities (i.e., seeking employment and improved living conditions), play a significant role in driving urbanization. This influence is evident in the districts examined in this study. The research highlights population growth as the primary driver behind the expansion of developed areas in both districts. In the Arnavutköy district, the population increased from 37,556 in 1997 to 62,492 in 2007 and then markedly to 261,655 by 2017. Similarly, the Hills Shire LGA experienced population growth from 118,797 in 1996 to 138,034 in 2007 and further to 173,067 in 2018, indicating significant demographic growth in both study areas. Notable differences in population growth rates are observed; Arnavutköy district experienced a growth rate of 596% between 1997 and 2017, while the Hills Shire LGA had a rate of 45% from 1996 to 2018. This comparison underlines the varying dynamics of urban expansion in these two regions. When comparing this data to the overall population growth rates of Istanbul (50%) and Sydney (24%) during the same periods, it becomes evident that both the Arnavutköy district and Hills Shire LGA are closely aligning with the urban expansion trends observed in their respective metropolitan areas. The high rate of population growth in each study area emphasizes the need for effective urban planning models at the district level in megacities.

The findings align with numerous studies (Vitousek et al. 1997; Khan et al. 2014; Kuchay et al. 2016; Al-Darwish et al. 2018; Dhali et al. 2019; Mahtta et al. 2022). Particularly noteworthy is Mahtta et al. (2022) global study examining the relationship between urban expansion and population growth across 300 cities. Their research underscores that urban population growth has a more substantial influence on urban expansion than economic growth, with this effect becoming more pronounced after the year 2000.

Beyond population growth, factors such as industrial progress, environmental concerns, and urban planning have significantly impacted both cities. Specifically, urban policy documents, strategies, and targets developed for Istanbul do not accurately mirror the real-world outcomes of urban renewal and transformation projects. In essence, these initiatives failed to effectively control or manage the city’s expansion, leading to approximately 60% of the development being unplanned, unauthorized, low-cost, and incremental (Edelman 2021; Can 2022). In contrast, Sydney is adopting a more proactive approach to sustainable development. As part of the ‘Sustainable Sydney 2030’ project, the city aims to promote social connectivity and minimize environmental impact as it expands, with a focus on a green environment, integrated transport, and sustainable urban design (Hu 2015; Kornberger et al. 2021).

Conclusion

Addressing global challenges of urban expansion and population growth necessitates continuous monitoring of land use changes for effective planning. This study examines an integrated CA-Markov model to predict future LULC and urban expansion in two districts within two megacities, projecting outcomes for the year 2050. Prior to the modelling phase, an assessment of LST data was conducted to evaluate the environmental impacts of urban expansion.

The key findings can be summarized as follows: (i) The consistent increase in LST, especially in built-up areas, and the continuous decline in NDVI values over time in both districts indicate a decrease in vegetation cover, highlighting the environmental impact of urban expansion. (ii) Projections for 2050 show a significant expansion of built-up areas (44% in Arnavutköy and 51% in Hills Shire LGA), along with a projected reduction in forest cover (26% and 30% respectively). Agriculture and wetlands, however, are expected to remain stable in both districts. (iii) The integrated CA-Markov model proves its reliability in predicting 2050 LULC maps, achieving high Kappa coefficients of 0.87 for Arnavutköy district and 0.88 for Hills Shire LGA.

This study emphasises sustainable urban expansion by advocating comprehensive environmental planning in line with the United Nations’ Sustainable Development Goals, in particular Goal 11 on sustainable cities. In general, the different model scenarios developed aim to provide decision-makers with the possibility to control and monitor potential situations and challenges that future conditions may pose and underline the importance of the integration of sensitive environmental plans and socio-economic data for future urban development. In this context, due to Sydney’s more proactive approach to sustainable development, the scenarios projected for Istanbul are expected to be more complex than those for Sydney. This complexity in Istanbul is a result of its unplanned and unauthorized rapid expansion, as well as its unpreparedness for potential major natural or man-made events and challenges, such as earthquakes, or economic crisis. Therefore, by providing key predictions on district-based urban expansion in fast-growing megacities, this study underlines the necessity for decision-makers to integrate sustainable development goals into local urban planning strategies.

In terms of future research, there are several planned objectives: (i) enhance the CA-Markov model’s simulation and prediction capabilities by incorporating socio-economic factors such as population growth and household income data, along with local governance dynamics; (ii) utilize satellite sensors with improved spatial resolutions for more accurate LULC mapping; and (iii) integrate various methods like AHP, ANN, etc., to improve the CA-Markov model’s ability to predict urban expansion.