A Hybrid Similarity-Based Method for Wind Monitoring System Deployment Optimization Along Urban Railways

Urban railways in coastal areas are exposed to the risk of extreme weather conditions. A cost-effective and robust wind monitoring system, as a vital part of the railway infrastructure, is essential for ensuring safety and efficiency. However, insufficient sensors along urban rail lines may result in failure to detect local strong winds, thus impacting urban rail safety and operational efficiency. This paper proposes a hybrid method based on historical wind speed data analysis to optimize wind monitoring system deployment. The proposed methodology integrates warning similarity and trend similarity with a linear combination and develops a constrained quadratic programming model to determine the combined weights. The methodology is demonstrated and verified based on a real-world case of an urban rail line. The results show that the proposed method outperforms the single similarity-based method and spatial interpolation approach in terms of both evaluation accuracy and robustness. This study provides a practical data-driven tool for urban rail operators to optimize their wind sensor networks with limited data and resources. It can contribute significantly to enhancing railway system operational efficiency and reducing the hazards on rail infrastructures and facilities under strong wind conditions. Additionally, the novel methodology and evaluation framework can be efficiently applied to the monitoring of other extreme weather conditions, further enhancing urban rail safety.


Introduction
With network speed and coverage increasing, the railway system has become more vulnerable to the complicated and fickle natural environment.Extreme weather such as strong wind is one of the main degradation factors compromising the safe and normal operation of railways [1,2].Since the opening of the Lanzhou-Xinjiang and Southern Xinjiang railway lines, there have been over 30 wind-induced incidents of train derailment and overturning, resulting in nearly 100 car overturns, causing direct losses of over 10 million Chinese yuan [3].Severe incidents caused by strong winds have occurred at an approximate frequency of once every 1-2 years in the Xinjiang region.Similar incidents of derailment and train overturning have been reported globally, including in the USA [4] and Switzerland [5].
In order to mitigate wind-induced risk, natural hazard monitoring systems, in which the strong-wind warning system is one of the critical components, have been developed Abstract Urban railways in coastal areas are exposed to the risk of extreme weather conditions.A cost-effective and robust wind monitoring system, as a vital part of the railway infrastructure, is essential for ensuring safety and efficiency.However, insufficient sensors along urban rail lines may result in failure to detect local strong winds, thus impacting urban rail safety and operational efficiency.This paper proposes a hybrid method based on historical wind speed data analysis to optimize wind monitoring system deployment.The proposed methodology integrates warning similarity and trend similarity with a linear combination and develops a constrained quadratic programming model to determine the combined weights.The methodology is demonstrated and verified based on a real-world case of an urban rail line.The results show that the proposed method for railways [6,7].The work process of the system is as follows.The real-time wind speed data are collected by sensors deployed alongside the railways and transmitted to a server.Through analysis, the warning information is generated for a possible gust of strong wind.Finally, the dispatcher will impose a safe speed on the train [8].Since the anemometers are discretely located along the line, once the strong wind is observed at a monitoring spot, the alarm range will automatically cover the track sections between its two adjacent spots.
However, inappropriate deployment of the monitoring spots may cause trouble during the operation.A low-density wind sensor layout may lead to system malfunction in situations such as complicated terrains and monsoon zones.In mountainous areas where the wind conditions differ from homogeneous terrains, the sparse anemometer layout has limited capabilities in capturing the spatial characteristics of the airflow [9].Consequently, overturning accidents still happened in some areas where the monitoring system has already been built [10].Moreover, in coastal areas where strong winds typically occur, the frequent strong-wind alarms adversely affect the punctuality and efficiency of trains.For instance, in 2015, over 360 warnings occurred due to strong wind in Guangdong Province, China [11].Considering the characteristics of urban railway systems (e.g., concentrated passenger flow, limited space, and closed running), such operational interruptions caused by extreme weather conditions can lead to significant delays [12].Meanwhile, a low-density anemometer deployment may drive operators to adopt more conservative operating strategies.For instance, warnings will cause a speed limit to be imposed over a longer section than would occur with dense anemometer deployment.In this condition, inadequate density of anemometers deployed along railways in windy areas results in larger affected areas and causes greater economic losses.
Previous standards in the railway industry and academic research have proposed various approaches to optimize the deployment of anemometers.Nevertheless, the question of proper deployment of new anemometers in the present system remains to be answered.In existing standards, the density requirement for anemometers along the railways is proposed.According to the Chinese high-speed railway design code, the distance between adjacent wind stations should be 1-5 km in high-risk areas such as mountains, canyons, and valleys, and 5~10 km for bridges and high embankments [13].As the standard considers terrain as the sole determinant, it ignores other factors such as railway construction conditions, and lacks a quantitative methodology based on historical meteorological data to determine the positions.
In existing academic papers, four considerations have been summarized for deploying anemometers along railways: (a) historical wind-induced accidents, (b) historical meteorological data, (c) computational fluid dynamics (CFD) results, and (d) feasibility [14].Adding new sensors according to historical accidents is a passive measure which is not recommended in risk management [15].In contrast, making decisions based on measurement data is more reliable.A direct way included in the present standard is to add new anemometers in the sections where historical wind speed exceeds a threshold [16].However, the spot where we observe the extreme wind speed is not the same as the spot we need to add wind stations.In other words, the standard does not provide instructions about how to add new wind stations along the "blank segments" without measurement data.More favorable approaches to searching for highrisk points in "blank segments" based on existing data are needed.Spatial interpolation and signal correlation are two main categories of the methodology applied in site suitability analysis.Spatial interpolation methods are commonly used in meteorology-related research to generate continuous data from point sampling data [17].Baseer et al. proposed a site selection model for wind farms wherein the inverse distance weighted (IDW) interpolation technique was used to estimate wind speed between wind stations where there are no directly measured data [18].Ye et al. compared the performance of deterministic and geostatistical interpolation methods based on a dataset of multiyear wind records from 235 meteorological stations in Canada.The results confirmed the validity of spatial interpolation methods in extreme wind analysis [19].Evaluating the signal similarity of existing monitoring spots is another way to determine new installation locations.Clustering analysis is a common technique used in signal processing and feature recognition [19,20].Cao et al. applied clustering analysis to sensor deployment in the building environment.The clustering centers were set to be the sensors' locations, which incorporates the hidden hypothesis that the points with the largest difference should be identified as the monitoring location [21].The clustering technique can efficiently handle the similarity evaluation of high-dimensional scattered data points.However, its efficacy may be limited in the context of line arrangement on railway occasions.Du drew inspiration from the alerting ability of anemometers and introduced the concept of "warning similarity" to measure differences in warning behavior between adjacent monitoring locations.If the alert behaviors of two adjacent anemometers on a railway stretch are similar to each other, the installation of additional anemometers within the same section is deemed unnecessary [22].This method enables the assessment of each segment along the railway and is specifically suitable for determining the sections for deploying new anemometers.Nevertheless, the measurement data are substantially affected by local environmental factors, as verified by numerous studies that focus on investigating the local wind pattern of railway profiles based on CFD approaches [23,24].Specifically, the peak values of observed wind speed may be reduced by surrounding buildings and objects, leading to considerable divergence in warning indications of two adjacent anemometers.Therefore, determining the need for installation using warning similarity alone is inadequate, due to the complex wind patterns.
In order to investigate more reliable and robust approaches, this paper improves upon the existing warning similarity-based method by integrating another indicator that measures the correlation of the general trend of two wind signals to determine the railway section for additional anemometer deployment.Additionally, an optimization model is developed to attain the optimal weights to combine these two similarity metrics.The proposed methodology provides a useful tool for facilitating the optimization of existing railway wind monitoring systems with additional monitoring equipment.The installation scheme can be retrieved directly via the analysis of available historical wind speed data without costly field testing and complicated simulation experiments.The method is demonstrated and validated based on a real-world case.The results show that our proposed method exhibits greater robustness than the use of warning similarity, trend similarity, or spatial interpolation approach alone.

Methodology
In this section, we present the methodology and theoretical background of our research, which is outlined in Fig. 1.The input comprises the historical data of each anemometer positioned along the railway, while the output is a prioritized ranking of intervals between existing anemometers with their hybrid significance.The ranking involves deploying additional monitoring spots whenever the data obtained from two adjacent anemometers display a significant discrepancy.To evaluate the similarity of the two wind speed series, Du [22] proposed warning similarity.However, the use of the warning similarity metric alone may not provide an accurate decision, as it disregards the impact of local environmental factors.Therefore, we propose augmenting the evaluations with trend similarity, which is not influenced by the surrounding environment.Finally, to rank the significance of railway sections, a hybrid significance score, which is a weighted average of the warning similarity and trend similarity measures, is computed.To determine the optimal weights, a quadratic programming model is developed to minimize the difference between the output significance and the ground truth.

Data Preprocessing
The acquisition of wind speed data is often plagued by problems such as data anomalies and missing data due to mechanical malfunctions [25,26].It is essential to conduct data preprocessing on each series before incorporating it into our model.
As for data anomalies, the standard procedure often entails discarding recordings that exhibit either excessively high or low values.In situations wherein abnormally high values arise, a fixed threshold value can be set to exclude instances of readings surpassing this limit.However, this approach cannot be employed when detecting instances of abnormally low values, as periods of windless weather may Fig. 1 The methodology pipeline of this research be typical.Thus, the detection of such instances requires a comparative analysis of adjacent wind speed series within a time frame.Specifically, if an anemometer's readings reflect uninterrupted 0 m/s wind speed while adjacent sensors exhibit typical wind speed fluctuations, this suggests an instance of abnormally low values.
In dealing with missing data, a critical initial step involves assessing the magnitude of the missing data.An excess of missing data has the potential to deleteriously impact the precision and validity of subsequent analyses.As such, voided segments exceeding a certain proportion of the total data are considered for deletion.It should be noted that when these segments are removed, the same part of adjacent wind speed signals must also be removed for subsequent comparisons.However, if the data loss is acceptable, we can fill in the missing components with interpolated values.Due to the great volatility of wind speed, high-order interpolation approaches still exhibit significant limitations in accurately fitting the data, leading to a possible manifestation of the Runge phenomenon that generates extraneous information [27].A simple linear interpolation method is thus preferred.

Warning Similarity
When two neighboring wind monitoring stations consistently sound the alarm simultaneously, the entire section between them will be controlled irrespective of the presence of an additional anemometer in that section.Hence, the warning similarity between two adjacent locations serves as an important indicator for determining the need to install another monitoring station between them [22].

Strong Wind Warning Mechanism
The warning mechanism includes the protocols regulating the initiation and termination of an alert.According to the standard of Japan [28], upon the detection of wind speed beyond the predetermined threshold at one monitoring spot, its contiguous sections enter an alarm state.The alert is promptly terminated if no instance of wind speed exceeding the threshold has been reported within the preceding 30 min.

Concurrence Degree of Alerts
We quantify the warning similarity by calculating the extent of concurrence between two series of alerts.The adjacent anemometers A i and A i+1 are denoted as having a starting time of s i j and s i+1 k , and an ending time of e i j and e i+1 k , each corresponding to the alerts j and k of A i and A i+1 , respectively.The alert duration can be calculated by Δt j = e i j − s i j for alert j , and Δt j = e i+1 k − e i+1 k for alert k .The determination of alert concur- rence can be achieved by the following criterion [Eq.( 1)].
Once meeting the above criterion, the similarity of alert j and k can be quantified by Eq. ( 2), which involves the integra- tion of the proportions of the overlapping segment within both alert duration (Fig. 2).
where t max = max e i j , e i+1 k and t min = min s i j , s i+1 k .

Strong Wind Warning Mechanism
The final step in determining warning similarity is to aggregate all the alert concurrences into a singular value.Assuming that n i and n i+1 alerts are recorded in the historical data of A i and A i+1 , respectively, there is a combined total of n i n i+1 sets of alert series for evaluation.The warning similarity between A i and A i+1 can be quantified by: (1) .
Fig. 2 Concurrence of two strong wind alerts

Trend Similarity
If two adjacent anemometers detect wind speed series with comparable trends despite inconsistent absolute values, it is plausible to conclude that the wind patterns in the region encompassing them are alike.The peak wind speed values obtained from the monitoring spot may be reduced by local environmental factors like high-level buildings.In this condition, a significant degree of likeness in the trend of wind signals suggests a persistent and uniform wind pattern within the section, notwithstanding the discrepancy in warning data provided by them.
Traditional correlation measures such as Pearson and Spearman provide a framework for evaluating the degree of covariation between two continuous time-series data [29].Nevertheless, due to the complexity and volatility of wind speed data, direct application of these methods may not necessarily yield adequate results [30].Therefore, in this particular research endeavor, we propose the use of windowed correlation techniques alongside moving average procedures as a means of enhancing the estimation of trend similarity.

Windowed Correlation
The length of the input time series constitutes a crucial aspect that affects the outcome of the similarity measures.If we apply correlation measures to the entire historical data, the outcome is limited to reflecting trend similarity in large time periods, overlooking characteristics of the trend in finer temporal scales.One solution to this issue is to compute the trend similarity over smaller time windows and aggregate the results from all the windows.
The current study considers all the strong wind warning segments identified as the target windows.The reasons for selecting warning windows lie in the following two points.Firstly, by utilizing the warning segments as the basis for calculation, we ensure consistency between the data segments used and facilitate the integration of the trend and warning similarity metrics.Secondly, the selection of data from these strong wind segments minimizes the presence of long, monotonic wind sequences commonly found in calm weather conditions that may hinder the validity of the calculations.
For each alert series of anemometer A i , the wind speed series within the same time slice from A i+1 should be retrieved to evaluate their correlation and vice versa.This study employs the Spearman coefficient for assessing the trend similarity, given that the wind speed data do not obey the prerequisites of Pearson correlation analysis such as linear correlation and normal distribution.
The Spearman correlation between the j th alert of A i and the corresponding time segment of A i+1 is denoted by where n j is the length of alert j of A i , and d n is the difference between the two ranks of each observation.
The trend similarity of two adjacent monitoring spots can be obtained by calculating the mean value of all the r i,i+1 s and r i+1,i s .
where n i and n i+1 are the number of alerts for A i and A i+1 , respectively.

Moving Average
High-frequency wind speed data exhibit a highly volatile nature, which presents challenges for detecting trends in a limited time frame.To address this issue, we utilized the moving average method to remove the stochastic fluctuations and uncover the underlying data trend.Through the implementation of the simple moving average (SMA) method, this study replaces the original time series with the mean value in a designated time interval that moves from start to end.

Hybrid Significance
In this paper, hybrid significance is proposed as a refined metric that integrates warning similarity and trend similarity, via a weighted averaging process.To achieve this integration, a series of continuous anemometers and Rt i,i+1 , Rw i,i+1 values for each decision section i are required, and standardization and inverse processing need to be applied.Prior to the integration, it is necessary to standardize the original values of warning similarity and trend similarity through Z-score standardization, thereby achieving a common scale.Then, the standardized similarity metrics are transformed into the negative power of the Euler number to adjust for the negative correlation between the need for additional monitoring spots and similarity measures.This process leads to the measurement of warning similarity significance ( W 1 ) and trend similarity significance ( W 2 ).Refer to Eqs. ( 6)-( 8) for the detailed calculation process. (4) where Rw i,i+1 z and Rt i,i+1 z are the standardized similarity measures of decision section i .S i h is the hybrid significance of decision section i . 1 and 2 are the weights need to be determined.
The crux of the matter is to identify the optimal combination of weights, which could generate a hybrid significance that closely approximates the ground truth.This concern may be formulated and resolved by the application of a constrained optimization model.

Independent Warning Coefficient
This paper introduces a metric, referred to as the independent warning coefficient ( I w ), to indicate the actual need for installing supplementary anemometers along the given railway section.I w is calculated based on the historical wind speed data.Assuming the presence of three adjacent anemometers (i.e., A i−1 , A i , and A i+1 ), I w of A i is specifi- cally defined as the proportion of its warning segments that remains non-overlapping with the warning segments associated with A i−1 and A i+1 (refer to Eqs. ( 9)-( 11) for a precise formulation).
Given the j th alert of A i and the k th alert of A i+1 , the inde- pendent warning degree of alert j against alert k is defined as I i,i+1 w (j, k): where the other notations are the same as those in Eq. ( 2).If there are in total n i and n i+1 alerts reported in A i and A i+1 , respectively, the independent warning degree of A i against A i+1 can be evaluated by: The ultimate independent warning coefficient of A i ( I i w ) is defined as the average value of I i,i+1 w and I i,i−1 w .
Similar to the hybrid significance, we finally standardize the independent warning coefficient through the softmax function and obtain W i t as the true significance of anemometer c.

Quadratic Programming
In this section, we develop a constrained optimization model to determine 1 and 2 .To begin with, we take A i as an unknown location and evaluate the requirement for installing additional monitoring locations within the section lying between A i−1 and A i+1 (a decision section).Specifically, we consider the hybrid significance measure computed by the historical wind speed data obtained from A i−1 and A i+1 .Our objective is to minimize the gap between the estimated significance measured by S h and the true significance measured by I w .We apply the sum- squared error of all the decision sections as the loss function.
The resulting model can be transformed into the standard form of convex quadratic programming (QP) with linear constraints.
The detailed elucidation of notations in the above model is as follows: = ( 1 ,  2 ) ⊤ is the vector of decision variables.
is the quadratic coefficient matrix, wherein the definition of W i 1 and W i 2 are derived from Eqs. ( 6) and ( 7), and N represents the total sample num- ber of anemometers.
is the vector of linear parameters, where the definition of W i t is derived from Eq. (12).
is the coefficient matrix of inequality constraints.h = (0, 0, 1, 1) ⊤ is the vector of inequality constraint values.A = (1, 1) is the coefficient vector of equality constraint.b = 1 is the equality constraint value.
To solve the QP, various algorithms such as active set, interior point, and conjugate gradient methods can be utilized [30].After obtaining the optimal weights o 1 and o 2 , the hybrid significance of decision sections can be quantified.The sections with high potential strong wind risk can be ultimately selected for installing new anemometers.

Case Study
In this section, we demonstrate the aforementioned methodology in a real-world case.The Hanghai railway is an intercity line that facilitates the transport of commuters between the cities of Hangzhou and Haining in Zhejiang Province, ( 13) China.The railway line operates in a coastal zone, characterized by a typical subtropical monsoon climate, which results in numerous meteorological hazards [31].The line spans over 30 miles, with 13 stations including five underground and eight elevated stations.In each elevated station, a threecup anemometer has been installed atop the roof platform of the annex.The route map and the facilities are shown in Fig. 3.

Exploratory Data Analysis
The wind speed data are collected with a frequency of 1/60 Hz (1-min data) at each elevated station from November 2021 to March 2023.An exploratory data analysis (EDA) is first conducted to investigate the fundamental characteristics of the data.As shown in Fig. 4, the distribution of wind speed in this region is positively skewed, with a prolonged tail, indicating the prevailing calm weather and light breeze.The subgraph in Fig. 4b demonstrates the tail part, depicting the strong wind speed exceeding 13m/s and corresponding to force 6 on the Beaufort scale.
Before the comparison analysis of different stations, we evaluate the anomalies and missing data in the dataset according to the criterion proposed in Sect.2.1.
The missing data are assessed at each monitoring spot by quantifying the maximum proportion of missing data segments in relation to the total dataset.A proportion over 5% is regarded as the threshold for deleting the segments [32].The findings are presented in Table 1.Notably, Hainingxi Railway Station exhibits a missing data segment that accounted for over 10% of the total data.This specific missing segment is visually represented in Fig. 5a.The deletion of this segment is recommended to prevent significant distortion of the original information if it were to be filled using an interpolation method.Also, when assessing the similarity in the following the corresponding segments of the adjacent stations (Xucun and Changan) should also be deleted.The length distribution of the remaining missing data segments is illustrated in Fig. 6, which shows that over 90% of the segments are shorter than 30 min.In light of this observation, linear interpolation emerges as the preferred method for filling the voided segments within the context of this dataset.As a simple and efficient approach, linear interpolation proves effective in producing smooth series and preserving the trend.Also, linear interpolation can reliably interpolate values within the range of the given data points, while spline and polynomial interpolation methods may generate values that extend beyond the reasonable range.Existing studies also applied linear interpolation to treat missing data in short-term data loss conditions [33].The wind speed in hurricane conditions, 32.7 m/s, is set as the threshold to identify abnormally high values.As for abnormally low values, we apply a 60-min time window to the data from each monitoring spot (referred to as the target spot).The observed wind speed values within the window are evaluated whether all of them are lower than 0.2 m/s (force 0 on the Beaufort scale).If a window satisfies the criteria, we calculate the average gap of data from adjacent monitoring spots in the same window and that of the target spot.The window of the target spot is identified as containing outliers if the average gap is larger than 3 m/s (the difference between force 0 and force 3 on the Beaufort scale).As a result, no abnormally high value is identified in the dataset.However, a 50-h period of wind speed values at Zhouwangmiao station is recognized as low outliers.A subset of the outliers is visualized in Fig. 5b.This segment, along with their corresponding segments of the adjacent stations, should be eliminated.Otherwise, it will impact the accuracy of both warning similarity and trend similarity obtained in the following steps and undermine the validity of the evaluation results.
After preprocessing, we compare the wind speed data at different stations including their descriptive statistics and warning behaviors, as presented in Table 2. Since the actual strong wind warning threshold (13.9 m/s, force 7 on the Beaufort scale) is hardly reached at some stations, we lower the threshold to + 5 to ensure that every sta- tion receives alerts for further analysis.Despite the comparatively short distance, approximately 5 km, between adjacent observation stations, there exists a significant difference between the historical maximum wind speed and the number of alerts generated by the respective stations.The observed variation in wind speed data across different locations along the railway underscores the complex and dynamic wind patterns in the area.This highlights the need to augment the existing strong wind warning system by deploying additional anemometers.
In the following part of the case study, we will demonstrate and validate the proposed methodology.Specifically, in every three consecutive stations, the middle one serves as a decision point or section, where additional anemometers may be installed.For instance, in the case of Xucun, Hainingxi Railway Station, and Changan, the presence of Hainingxi Railway Station is disregarded, and the similarity of data observed at Xucun and Changan is evaluated using

Ground Truth of Installation Necessity
First, we assess the actual requirement for installing an anemometer based on the independent warning coefficient ( I w ).The assessment excludes Xucun and Xieqiao stations.
According to the ranking of I w (Fig. 7), it was identified that deploying an anemometer at Tongjiu Highway, Zhouwangmiao, and Hainingxi Railway Station is imperative, while Yanguan station demonstrated the least demand.The results show conformity with the EDA presented in Table 2. Notably, high I w values were generally observed when the maximum wind speed exceeded those measured in neighboring stations (i.e., the crest), whereas relatively low I w values were obtained when the maximum wind speed was lower than adjacent stations (i.e., the trough).

Hybrid Significance Calculation
For each decision section, the warning similarity and trend similarity are evaluated.Additionally, the corresponding values of W 1 and W 2 are calculated through the application of the opposite sign and softmax function.The results are shown in Table 3.
Given the ground truth of installation requirements ( I w ) and W 1 , W 2 of all the decision sections, we employ the QP method to fit the I w as a linear combination of W 1 and W 2 .The optimal weight combination is determined by utilizing the CVXOPT library, which is based on Python programming language.The optimal solution for the value of 1 and 2 is 0.818 and 0.182, respectively, and thus the values of hybrid significance ( S h ) are obtained.It should be noted that, instead of making decisions based on the absolute values of S h , the authors recommend leveraging the variables to compare the relative need for installing anemometers.The results are plotted in Fig. 8. Validation group 1 shows that the priority ranking based on S h is congruent with the true ranking based on I w .In vali- dation group 2, the proposed method displays wrong identification regarding the comparative significance of decision sections 2 and 4. Nevertheless, the S h aptly identifies deci- sion section 6 as the foremost priority.

Discussion
In this section, the results are further discussed.We verify the reliability and robustness of our proposed methodology by comparative analysis and sensitivity analysis.
Firstly, to verify the reliability of the proposed methodology, a comparison is carried out among different approaches for assessing the potential need for installation, and the results are illustrated in Table 4, where sections are denoted as "S."The asterisks (*) indicate instances where the decision section was ranked incorrectly when compared to the ground truth.
It can be inferred from Table 4 that similarity metricbased methodologies outperform spatial interpolation-based methods in terms of assessing potential strong wind risk.As for interpolation methods, the inverse distance weighted (IDW) interpolation is applied to estimate the mean and maximum value of wind speed at each section by utilizing the available wind data from neighboring stations.The results indicate that interpolation methods cannot effectively handle the situation when the wind pattern in the section between two anemometers is different from the two adjacent ends.In terms of approaches based on similarity metrics, the hybrid significance demonstrates higher robustness than using warning similarity or trend similarity alone.Specifically, the method considering warning similarity as the only indicator performs better in validation group 1, while the method based on trend similarity achieves the correct ranking in validation group 2. Therefore, the combination of these two metrics is supposed to yield more robust evaluations in varying scenarios.
Subsequently, we focus on discussing the fusion of two similarity metrics.In this paper, we consider the installation necessity as a linear combination of W 1 and W 2 , and propose a QP model to acquire the optimal weights based on the available data.However, due to the diminutive size of our decision samples, it remains plausible that the optimal weights we attain may not exhibit robust generalizability for new samples.In order to verify the robustness of the fusion approach, a sequence of weight combinations and their corresponding priority rankings were compiled for each validation group in Table 5.The columns labeled "loss 1" and "loss 2" represent the value of the objective function as in the QP model for two validation groups, respectively.
According to the data presented in Table 5, the proposed fusion method provides notable robustness since the priority ranking remains relatively constant despite variations in the weights.This suggests that the method is highly insensitive to parameter changes.Despite the observed insensitivity of the proposed fusion method to variations in weights in this case, it does not undermine the significance of finding suitable weights.This insensitivity is partly attributable to the relatively small size of the data.In the event of a larger dataset, the QP model is believed to demonstrate its efficiency  in determining optimal weights to fit the ground truth.To sum up, the proposed hybrid method exhibits its strength in assessing the potential strong wind risks and the need to install supplementary anemometers in areas with intricate wind patterns.

Conclusions
A low-density wind sensor layout along the railway tracks may be insufficient for detecting wind hazards, and it is also likely to result in excessive control, which undermines the operational efficiency.Effectively optimizing the existing anemometer deployment along tracks is an essential challenge in practice.This paper assesses the significance of installing monitoring sensors in specific railway segments via similarity-based methods.The methodology was developed based on an analysis of existing wind speed monitoring data without the need for various meteorological data or complicated simulation models.This approach aids in making informed decisions under the constraints of limited data and cost.The existing approach is enhanced by integrating warning similarity and trend similarity measures, considering the distinct alert behaviors and local environmental factors of adjacent anemometers.Furthermore, a quadratic programming model is developed to determine the weights required to fuse the two metrics.The validity of the proposed methodology is demonstrated using the Hanghai intercity railway line as a case study.The results of comparative analysis among different evaluating methods indicate that our proposed method is particularly effective in identifying potential strong wind risks in railway sections where the wind patterns deviate from those recorded at adjacent stations.Also, the method shows greater robustness than other methods when tested across different validation groups.
However, this research is also subject to certain limitations due to the dataset size and other potential factors.Specifically, since the hybrid significance is derived solely as a linear combination of W 1 and W 2 , more complex nonlinear models (e.g., artificial neural networks) can be explored in further studies.Additionally, the research lacks the use of stronger validation measures such as cross-validation, due primarily to the limited availability of monitoring spots.Despite these limitations, the proposed methodology presents itself as a useful tool for railway operators to improve their monitoring systems with restricted data and budgetary resources.Furthermore, the evaluation framework is transferable to other engineering domains, such as wind turbine installations.
were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http:// creat iveco mmons.org/ licen ses/ by/4.0/.

Fig. 3 Fig. 4
Fig. 3 Hanghai railway route map and the deployment site of anemometers

Table 1
Summary of missing data at each station in the dataset

Table 2
Comparison of wind speed data at different stations a The warning threshold is set to be μ + 5σ, where μ is the mean value and σ is the standard deviation of the overall dataset

Table 4
Comparison of different methods