1 Introduction

The railway network as a linear asset spread over a long distance and is susceptible to a range of weather impacts, including severe weather events, heatwaves, flooding, and rising sea levels, which ultimately lead to reduced availability, safety, and punctuality, as well as increased operation and maintenance costs (Forzieri et al. 2018; Thaduri et al. 2021; Garmabaki et al. 2021, 2022; IPCC 2022; Miller and Huntsinger 2023; Salimi and Al-Ghamdi 2020; Pour et al. 2020; Famurewa and Hoseinie 2016; Soleimani-Chamkhorami et al. 2024). Moreover, railway infrastructure plays a crucial role in facilitating safe and efficient transportation of people and goods to fulfil the future sustainable development goals (SDG) (Sachs et al. 2019). A significant portion of the railway network, including structures such as bridges, tunnels, and stations, was constructed without adequate consideration of future climate change impacts, such as rising temperatures, increased precipitation, and more frequent extreme weather events. As a result, it is a requirement to consider the potential impacts of climate change on railway infrastructure assets to mitigate the risks and ensure the longevity and resilience of this critical transportation system (OECD 2018). Therefore, proactive planning and investment in new technologies and infrastructure, such as improved drainage systems, weather-resistant materials, and more resilient tracks and rolling stock, as well as the development of new maintenance and inspection protocols, are necessary to ensure the safe and reliable operation of railways in the face of climate change. In general, two kinds of actions can be taken into account, including climate change mitigation and adaptation. Mitigation related to all activities leads to decreasing CO2 emissions; however, adaptation aims to reduce climate risks and vulnerabilities of existing infrastructures. Adaptation strategies for railway asset infrastructures can be grouped into (i) protect, (ii) accommodate, (iii) retreat, and (iv) avoid (Tyler 2015; IPCC., 2022; Kasraei et al. 2024).

One of the critical challenges associated with this strategy is the impact of changing weather patterns and climates across different geographic regions. Since the geographical and climatic conditions across a country such as Sweden can differ significantly, the management of transport infrastructures and assets can be impacted profoundly. To tackle this challenge, one possible solution is to divide the targeted area into various classes/groups that share significant similarities. This approach may make it easier to manage the effects of weather patterns and climatic consequences on railway assets. By categorizing regions based on their climate and geography, it becomes feasible to tailor infrastructure and asset management strategies to specific conditions, ultimately enhancing the overall resilience and efficiency of the railway network.

The paper’s objective is to partition Sweden’s railway network into distinct clusters with the greatest similarity, using meteorological and geographical attributes as criteria. Moreover, by employing this clustering approach, the paper intends to facilitate strategic comparisons among these clusters, thereby enabling the development of resource management strategies tailored to the specific characteristics of each cluster. This approach will help optimize resource allocation and enhance the efficiency of railway operations across Sweden’s diverse geographical and meteorological conditions.

This paper includes various sections as follows: in Sect. 2, different climate zone classifications are presented; next, in Sect. 3, the effect of climate change in Sweden is discussed, and the results for some cities are presented. Section 4 presents climate change’s impact on railway infrastructure. Thereafter, the research methodology and case study have been discussed in Sect. 5. Finally, Sect. 6, presents the main findings and future research directions.

2 Climate zone classification systems

There are some classification systems for climate zones include:

2.1 Köppen-Geiger system

The Köppen-Geiger system is a widely used climate classification method that identifies different climates based on temperature and precipitation patterns. It groups climates into five main categories, which are further divided into subtypes. The categories are tropical (A), dry (B), temperate (C), continental (D), and polar (E). The system assigns a combination of letters describing each climate zone, with the first letter indicating the major climate group and the second indicating the subtype. For example, a humid subtropical climate is classified as Cfa, where C represents temperate climates, f represents humid subtropical subtype, and A represents hot summer (Peel et al. 2007).

2.2 Hornthwaite climate classification system

This system classifies climates based on the water balance and potential evapotranspiration, it is often used in hydrology and water resource management to evaluate water availability and demand in a given region (Thornthwaite 1948; Feddema 2005).

2.3 Trewartha climate classification system

This system is based on the concept of seasonality and takes into account temperature and precipitation patterns throughout the year (Trewartha 1968).

2.4 Bergeron classification system

The Bergeron classification system is based on the relationship between temperature and precipitation. This system divides climates into four groups: polar, boreal, temperate, and tropical. Each group is further divided into subgroups based on temperature and precipitation characteristics(Chiu 2020).

2.5 Spatial synoptic classification (SSC)

The Spatial Synoptic Classification (SSC) system was developed by Mark S. Yarnal in the 1990s and is based on the synoptic (large-scale weather) patterns that produce local weather conditions. This system uses a combination of temperature, humidity, wind, and cloud cover to classify climates into six groups: dry tropical, moist tropical, dry mid-latitude, moist mid-latitude, dry polar, and moist polar(Cakmak et al. 2016).

2.6 Holdridge life zones

The Holdridge Life Zones system is another climate classification system that is based on the relationship between temperature, precipitation, and potential evapotranspiration. The Holdridge system divides the world into life zones based on three climatic variables: mean annual temperature, mean annual precipitation, and potential evapotranspiration. These variables are combined to create an index of biotemperature and a measure of aridity, which are then used to classify areas into one of 30 different life zones (Holdridge 1947).

3 Sweden’s climate change

Sweden is classified as a cold climate zone according to the Köppen-Geiger climate map (See Fig. 1), and its average temperature has risen by almost 2 degrees Celsius in comparison to the temperature to the end of the nineteenth century. In contrast, the global mean temperature has increased by approximately one degree Celsius. Figure 2 displays the charts of average temperature with bars, where the red bars represent temperatures higher than the average temperature for the normal period of 1961–1990, while the blue bars represent temperatures lower than the average. The chart also includes a black line showing a running mean calculated over approximately ten years. This curve clearly shows the continuous increase in temperature during the past two decades.

Fig. 1
figure 1

World map of Köppen-Geiger climate classification updated with mean monthly CRU TS 2.1 temperature and VASClimO v1.1 precipitation data for the period 1951–2000 on a regular 0.5degree latitude/longitude grid (Kottek et al. 2006)

Fig. 2
figure 2

Average temperature for different seasons from 1860 till 2020 (SMHI 2022)

There are several specific future climatic scenarios for the assessment of temperature variation, three of which are Representative Concentration Pathways (RCPs) scenarios, including RCP2.6, RCP4.5, and RCP8.5. Based on RCP2.6, greenhouse gas emissions will start to decline in 2020 and reach zero at the end of the century. In RCP4.5, emissions continue to around 2040 and decline afterward. In RCP8.5, emissions continued to increase until the end of the century. Compared to today, the estimated warming to the end of the century is roughly 1 °C, 2 °C and 4 °C for RCP2.6, RCP4.5 and RCP8.5, respectively (IPCC 2014). Figure 3 shows the projected temperature changes in summer for four cities in Sweden (Luleå: northeast; Kiruna: northwest; Malmö: Southeast; Stockholm: middle) according to three different RCP scenarios. The warming signal is everywhere in all scenarios, and warming until the end of the century is 2 °C (RCP2.6) to 4 °C (RCP8.5). At the end of the century, Luleå and Kiruna may have the same average summer temperature as Stockholm.

Fig. 3
figure 3

Projected summer temperature for four cities in Sweden

Figure 4 shows the projected annual precipitation for four cities in Sweden (Luleå: northeast; Kiruna: northwest; Malmö: Southeast; Stockholm: middle) according to three different RCP scenarios. The annual precipitation is predicted to increase everywhere, but the northern regions show larger changes and variability (Kasraei et al. 2024).

Fig. 4
figure 4

Projected precipitation for four cities in Sweden (Kasraei et al. 2024)

The following conclusions have been made for Sweden under different RCPs (SMHI):

  • High temperatures (above + 25-degree) will increase, especially in the south (most considerable difference in the north),

  • Zero-crossing will decrease in the south and increase in the north during winter. The north will experience more rain and more snow at temperatures close to 0 °C,

  • The amount of snow will decrease in general but increase in the north

  • Rain will increase within the whole country.

4 Impact of climate on railway infrastructure

Weather conditions such as temperature and precipitation can significantly impact railway infrastructure, causing severe damage. For example, high temperatures can cause the rails to buckle or expand, leading to structural damage. The proxy for high temperatures is summer days, defined as days with maximum temperature > 25 °C. A consequence of a warmer climate is more warm days. The number of summer days is projected to increase with 10–20 until the end of the century in southern Sweden, around twice the amount of today. Similarly, heavy rainfall or snowfall can cause infrastructure slope failures, track misalignment, and bridge scour. Additionally, extreme weather conditions can also cause damage to the catenary line and signal equipment, making it difficult for trains to operate smoothly (Palin et al. 2021; Stenström et al. 2012; Garmabaki et al. 2022). Flooding may occur due to intense short-time precipitation or large precipitation amounts over longer periods. Heavy precipitation is projected to increase in the future. Extreme showers are expected to increase by around 40% by the end of the century. The number of days with heavy or extreme precipitation are also projected to increase.

Extreme weather conditions, such as heavy rainfall, snowfall, freezing temperatures, and high winds, can lead to delays and even failures in railway infrastructure in northern Europe. Research has indicated that adverse climate conditions account for 5–10% of total failures and 60% of delays in the railway system in this region (Garmabaki et al. 2021). To address this issue, rail operators and infrastructure managers may need to invest in measures such as improved drainage, better insulation, more resilient track materials, and enhanced maintenance protocols. Additionally, contingency plans and procedures may need to be developed to respond effectively to weather-related disruptions. However, the specific impact of adverse climate conditions on rail infrastructure may vary depending on various factors, including geography, topography, and types of infrastructure and equipment used in different regions. Therefore, further research and analysis may be required to fully understand the relationship between climate conditions and rail performance. Figure 5 illustrates two climate-related incidents that occurred in Sweden in 2023, leading to injuries and disruptions in the railway network.

Fig. 5
figure 5

a Washed away track and b flooded track

5 Methodology

Given the potential impact of adverse climate conditions on railway assets, it is crucial to consider the diverse climatic zones in different regions and their effects on the railway infrastructure. In this context, this paper seeks to categorize the areas of Sweden into different sections, taking into account various climatic parameters such as temperature and the type of railway assets, including stations. To achieve this goal, unsupervised machine learning techniques such as K-Means will be utilized to cluster and group different areas based on their similar climatic conditions and railway assets (Kasraei and Ali Zakeri 2022; Kasraei et al. 2021). The use of machine learning techniques in this study can provide valuable insights into the relationships between climatic zones and railway infrastructure in Sweden. By analyzing the data on temperature and railway assets, the study can identify the areas most vulnerable to adverse weather conditions and require more attention and investment in adaptation and resilience measures. Furthermore, using unsupervised machine learning techniques such as K-Means can identify patterns and trends that may not be easily identifiable through traditional statistical methods. This approach can also help overcome the limitations of human judgment and bias, providing a more objective and data-driven perspective on categorizing different climatic zones and railway assets. This study focuses on railway S&Cs which are distributed over the network. S&C is one critical asset in the railway infrastructure networks as shown in Fig. 6. When the switching mechanism is initiated, the switchblade moves to its opposite position in order to divert the train in another direction. Many failures can cause S&C malfunction, and black box approaches have been followed while performing failure analyses.

Fig. 6
figure 6

Illustration of switches and crossings (Nissen 2009)

In addition, based on knowledge created in our previous projects and discussions with experts, temperature has been selected as the main climatic factor with a high impact on the railway infrastructure network.

This study includes the following subsections:

Railway asset specifications: This section involves gathering information on the railway network map of Sweden and selecting over forty railway stations located in different parts of the country. The GPS specifications of these assets are then determined and recorded for further analysis.

Meteorological data gathering and pre-processing: This section outlines the process of selecting weather stations based on the identified railway stations. The desired meteorological data is then gathered from open sources, and temperature records from the past 20 years are collected and analyzed using data from VviS and SMHI.

Clustering algorithms: This section explains the use of K-means clustering techniques to analyze the gathered data. The clustering technique groups the data based on their similarities, allowing for easier interpretation and identification of patterns in the data.

Reliability analysis: This section involves performing reliability analysis on the clustered data. After grouping the data into different clusters, the reliability of each cluster is assessed.

The framework of this paper is depicted in Fig. 7. The diagram illustrates the different components and their relationships in the proposed framework.

Fig. 7
figure 7

Framework of the study (Kasraei et al. 2023)

5.1 Data gathering and pre-processing

The present study involves collecting and analyzing diverse data sets about the railway network in Sweden. Specifically, the study includes the selection of forty railway stations from various parts of the network, and their locations have been identified for further analysis. In addition, meteorological data from weather stations close to the selected railway stations have also been collected. This data includes temperature readings, and records have been gathered over a period of more than 20 years. Including these diverse data sets allows for a comprehensive analysis of the railway network and its associated environmental factors. By examining the temperature records over an extended period, the study can identify any trends or patterns that may be present in the data.

5.2 Clustering according to climate zones

K-Means is an unsupervised machine learning algorithm that clusters and groups data points. The algorithm identifies the centroid of each cluster and then iteratively reassigns data points to the nearest centroid until the clusters stabilize. The “K” in K-Means refers to the number of clusters the algorithm aims to identify. This approach consists of the following steps:

  • Select the number of clusters (K)

  • Initialize the centroids of each cluster randomly

  • Assign each data point to the nearest centroid based on the Euclidean distance between the data point and the centroid

  • Recalculate the centroid of each cluster as the mean of all the data points assigned to that cluster.

  • Repeat steps 3 and 4 until the centroids no longer change or until a maximum number of iterations is reached.

  • The algorithm outputs the final clusters, each containing the data points assigned to the same centroid.

The number of clusters needs to be determined at the beginning of the K-Means method. The Elbow method is commonly used for this purpose. As shown in Fig. 8, the optimal number of clusters for the given dataset was determined to be four. Using this parameter (K = 4), the K-Means technique was implemented using the Spyder software.

Fig. 8
figure 8

Using the Elbow method to determine the optimal number of clusters for K-means algorithm

In the next step of the analysis, the dataset was clustered based on several features. Specifically, time series of temperature for each railway station were considered, and various parameters were extracted from these time series. These parameters included the mean temperature, standard deviation (std), skewness, kurtosis, as well as the geographic coordinates of the station (i.e., latitude, longitude) and its height above sea level. This approach allowed for a more comprehensive analysis of the temperature data, taking into account the temporal patterns and spatial characteristics of the data. By considering a range of features, the resulting clusters were able to capture the underlying structure of the data and identify groups of similar observations.

Furthermore, including geographic coordinates and height above sea level in the clustering process may reveal patterns related to topography and microclimate, which are known to affect temperature variations. This information could be useful for understanding the spatial distribution of temperature patterns and potentially identifying areas that are more susceptible to extreme temperature events. By leveraging multiple features in the clustering process, a more nuanced understanding of the temperature data can be obtained, leading to insights that may not be apparent from a single feature analysis. Figure 9 displays the outcome of the grouping process, which reveals the presence of four distinct clusters encompassing a total of 40 railway stations located throughout Sweden. The clusters shown in the figure are divided by green lines, and each cluster’s members are indicated by four different colours (green, red, blue, and yellow).

Fig. 9
figure 9

Depicting selected railway stations on various clusters over Sweden (Google earth)

5.3 Reliability analysis and discussion

5.3.1. Trend analysis.

The statistical trend test technique is utilized to assess the presence of any patterns or trends in cumulative failure times of a particular system over time. To evaluate trends in cumulative failure time, various statistical tests such as the Laplace trend test, Military Handbook test, and Anderson–Darling test can be employed. These tests analyze the data for a monotonic trend, which refers to a consistent increase or decrease in the cumulative failure time over time. If the test indicates a significant increase in the cumulative failure time, this may suggest that the system is becoming less reliable and may require maintenance or redesign.

Figure 10 depicts the trend in failures occurrence which, performed based on the clustering outcome given in subSect. 5.2.

Fig. 10
figure 10

The presentation of trend test results graphically

It can be observed that over time, there is a gradual increase in the curve representing the reduction in the reliability performance of the assets.

In trend tests, the null hypothesis is a statement of trend-free or no significant pattern in the data being analyzed. Table 1 shows the results of the statistical analysis that in all statistical tests, the null hypothesis is rejected for all cases (except the pooled Laplace’s for cluster 3, and the nonhomogeneous Poisson process (NHPP) is utilized for reliability analysis in the next step (Garmabaki et al. 2016).

Table 1 Trend test results for different clusters

The results of statistical analysis confirm the interpretation of the above curve, and P-value indicates that in all statistical tests, the null hypothesis is rejected except the pooled Laplace’s for cluster 3, and the cumulative failure time shows the trend, and the NHPP will be utilized for reliability analysis in next step.

5.3.1 Nonhomogeneous poisson process (NHPP)

The NHPP is associated with an intensity function (Eq. (1)) that signifies the rate of failures.

$$\lambda \left( t \right) = \frac{\beta }{\theta }*\left( {\frac{t}{\theta }} \right)^{\beta - 1}$$
(1)

The shape parameter (β) value determines whether the system improves, deteriorates, or remains constant over time. The value of β indicates an increasing failure rate, meaning the system is deteriorating. Under the NHPP case, the number of failures can be estimated through the integration of \(\lambda \left( t \right)\) and the reliability function can be approximated as given in Eq. (2):

$$\begin{array}{*{20}c} {m\left( t \right) = \mathop \smallint \limits_{0}^{t} \lambda \left( s \right)ds} \\ {R\left( t \right) = \exp \left( { - m\left( t \right)} \right)} \\ \end{array}$$
(2)

The reliability function, denoted by R(t), and the intensity function at time s, denoted by λ(s), are related through the cumulative intensity function, represented as \(\mathop \smallint \limits_{0}^{t} \lambda \left( s \right)ds\). According to the previous analysis, the shape and scale for the failure time are presented in Table 2.

Table 2 Parameters estimation for different clusters

Using the Eq. (1), Eq. (2), and the result, which are presented in Table 2, the reliability curves of four clusters are presented in Fig. 11.

Fig. 11
figure 11

Reliability curves for selected assets over time under influence of different climate zones

Notably, the data set under consideration comprises four distinct clusters, labelled Cluster1, Cluster2, Cluster3, and Cluster4, and encompasses 2,002 individual assets. These assets experienced a total of 24,738 failures over the course of the 18-year observation period. Table 3 reveals that Cluster 4, consisting of assets located in the southern region of Sweden, accounted for over 56 percent of all assets and nearly 60 percent of total failures. The data presented in Table 2 highlights the remarkable similarity between Cluster 4 and the integrated scenario, including all assets, as evidenced by the fact that Cluster 4 contributed to almost 60 percent of all observed failures. Furthermore, Table 3 (last column) shows that the distribution ratio of the number of failures per asset is approximately the same for all the clusters and integrated scenario. In addition, the ratio of assets per cluster and the ratio of number of failures per cluster are approximately the same (See columns three and five). Hence, a weighted combination of cluster failure parameters (shape parameter) provides a reasonably accurate approximation of this parameter for the reliability analyses of whole assets. This will help the infrastructure manager with tactical and strategical decision-making utilizing the cluster’s combined pattern, considering the ratio weighted for reliability, quality management, and maintenance planning.

Table 3 Distribution of assets and failures in different clusters

In Fig. 11, comparing the reliability behavior of the whole assets (black curve) and other clusters is evident that the clustering technique can lead to a reduction in uncertainty in modeling, as demonstrated by the varying reliability behaviors of the different clusters. This finding is of great significance in the field of machine learning for climate adaptation measures and risk assessment, where accurate and reliable models are crucial for effective decision-making. By clustering S&Cs based on their meteorological feature, the analysis provides insight into the underlying patterns and behaviors of these assets, which can assist in developing more robust and accurate models. Specifically, the reliability analysis conducted on the different clusters reveals that different groups of assets exhibit different reliability behaviors, with some being more reliable than others over time.

Figure 12 illustrates the differences between the integrated scenario and the clustering approach that divides the assets into four different groups based on their meteorological features. The results demonstrate that the number of failures at t = 40,000 h for each of the four clusters and the whole assets vary, with cluster 2 having the lowest (2.50) and cluster 3 having the highest (4.00). These differences highlight the importance of considering assets’ specific characteristics and environmental conditions when developing machine learning models for risk assessment and climate adaptation measures. Furthermore, the similarities between the number of failures for the whole assets (3.25) and cluster 4 (3.28) suggest that this cluster is representative of the overall behavior of the assets.

Fig. 12
figure 12

Expected number of failures curves for selected assets over time under influence of different climate zonesf

6 Conclusion

Machine learning is a practical tool to integrate climate adaptation strategies and risk assessments due to the extensive distribution of assets across diverse geographical areas, the varied characteristics of the assets, and their exposure to fluctuating meteorological conditions. This research focuses on 2002 S&Cs installed in different railway stations in Sweden. Temperature, a crucial meteorological factor associated with the coordination and altitude of railway stations, was chosen to categorize the railway assets into distinct climatic zones using the K-means technique. This technique resulted in four various clusters. Subsequently, reliability analysis was carried out for these clusters and an integrated scenario. The analysis revealed that each cluster exhibits distinct behaviors, with assets in Cluster 2 demonstrating the highest reliability and those in Cluster 3 showing the lowest over time. Our findings indicated similarities in reliability parameters between cluster 4 and the integrated scenario (whole assets).

The clustering approach employed in this study enables infrastructure managers to incorporate climate parameters into asset health performance assessments (reliability analysis), facilitating a deeper understanding of asset behavior under diverse meteorological and geographical conditions.

Looking ahead, our future research plans involve expanding the proposed methodology to include additional influential climatic factors, such as humidity, snow depth, rainfall amount, and wind speed, in combination with operational features like the age of assets, track types, gross tonnage, and speed.