Cluster Analysis as a Basis for the Development of an Application Assessing the Reliability of Transport Infrastructure

Increased demand for transport services and increased mobility of citizens can lead to a reduction in the level of reliability of transport systems. This in turn increases the demand for ways to assess the reliability of road infrastructure by both transport companies and individual users. The article presents the substantive basis of an application used to determine the reliability of transport infrastructure. Our approach was based on grouping information into clusters – based on the author's proprietary clustering method. Its basis is the detailed analysis of the road infrastructure in terms of errors occurring on it, divided into conceptual, design and operational errors. The methodology consists of three stages of clustering (1) creating a database of sections with assigned errors, (2) determining the initial clusters, (3) creating a �nal database of clusters, and then assessing the reliability of the road infrastructure of the transport system on their basis. The assumption is that the application will remain open-ended – i.e. the database will be developed by users. The proposed methodology was veri�ed on the example of the selected route in Poland (between Kalisz and Szczecin). Based on the results obtained during the experiment on the selected route, errors in the road infrastructure were determined. This, in turn, allowed us to �nd that there are a number of errors in the road infrastructure, including errors with a high frequency of occurrence i.e. the so-called permanent errors, which further con�rms the need to create an application to assess the reliability of the road infrastructure.


Introduction
The transport system consists of several areas, including: material and technical related to road infrastructure and means of transport, economic and organizational, institutional and legal, spatial and functional, and others.Therefore, determining a universal measure of reliability for the entire system is a very complex challenge that is threatened with numerous disruptions.Such disruptions include, for example, road accidents and collisions [1], the occurrence of which is stochastic in nature [2][3][4].It is not possible to accurately determine the place, time and type of an accident or road collision.However, it is possible to indicate places within the road infrastructure, where there is an increased risk of adverse events [5].These places, depending on the research methodology, are de ned as high-risk zones [6], danger zones [7], black sections, black spots [8-10], etc.There were many methods developed to determine the location of individual elements of the transport system [11], including black spots, whether on urban roads [12] or extra-urban roads [13].The database on the basis of which the research is conducted is the main disadvantage of these solutions.It must contain information on adverse events that occurred in the area covered by the analysis.This makes it possible to designate hazardous zones only after a certain period of road use and following a certain number of accidents and collisions [14].In addition, with data only available on the number of participants in accidents and collisions at a speci c time without accurate reports on the nature and type of adverse events, we remain at risk of erroneous designation of such zones.This occurs when, as a result of, for example, a weather anomaly, several accidents occur on one day or many vehicles participate in a single road event [15], which in turn will lead to exceeding the threshold number of road events and designating a high-risk zone at this place.Another problem that is often not taken into account is the complexity of factors affecting the occurrence of an adverse event, e.g.related to the reliability of means of transport [16].
In the result of such situations, the risk of non-performance of the transport task increases.Therefore, improving transport safety is one of the most important objectives of the EU transport policy, which is aimed at meeting the expectations of modern society, related to increasing mobility and improving the quality of life, especially following the COVID-19 pandemic [17].
Making decisions on shaping the transport system requires information support [18][19][20].Correct utilization of collected road safety information [21] can support the design and subsequent operation of the transport system, ensuring a high level of its reliability [22][23][24][25].
In connection with the above, there is also the need to revise the existing state of knowledge and develop a methodology for designing and analyzing the operational reliability of the transport system, which will create the possibility of a systemic review of safety issues [26].
Road Infrastructure Safety Management (RISM) [27][28][29], which determines the operational reliability of the transport system [30] by separating dangerous road sections, is therefore of strategic importance.
Directive 2008/96/EC of the European Commission is a document that sets out the directions for action consisting in uniform and integrated management of road infrastructure safety in the EU [31].The abovementioned Directive identi es four tools recommended for use in RIS Management procedures: 1. Assessment of the impact of the planned road on road safety in the network of cooperating roads (Road Safety Impact Assessment), The requirements contained in the Directive form the basis for road safety tests, which include: 1. Selection of methods for identifying dangerous sections on the existing road network (black spots and black sections) [32].
2. Identi cation of dangerous spots and/or sections.
4. Ordering of dataset on dangerous segments of infrastructure by grouping them.
5. Analyzing the results of the ordering performed.
The methodology of operational reliability analysis of transport infrastructure is of particular signi cance, as it allows for conducting research in the scope of irregularities (errors) occurring in the infrastructure and grouping individual sections into clusters [34][35][36].
There are various known methods for analyzing and processing information in the eld of transport systems [37].The effectiveness of cognitive processes aimed at studying possible events and phenomena that are able to disrupt the course of the transport process, including, for example, in cargo processes in freight transport, is increasing signi cantly [38].Every attempt to broaden the information [39] is based on the procedures of data collection and processing in order to organize (structure) them according to certain parameters, e.g. the level of threats to tra c ow, probability of road accidents, etc.In order to organize the collected data, they are clustered [40][41][42][43] or classi ed [44][45][46][47], which allows to take into account the requirements for information processing needed for the subsequent design of the application.
The presented considerations lead to the development of a concept that will allow for the assessment of the reliability of road infrastructure not only for the use of the authorities managing it, but also for transport companies and individuals.This concept constitutes the substantive basis for the development of an application that will enable the assessment of a selected route (divided into sections) in terms of its reliability.

Methodology
The concept of the application for assessing the reliability of road infrastructure was based on an openeditable database.This database should contain the set of routes, along with an analysis of errors occurring on them, creating an image of the entire transport system in material, technical and spatial, and functional terms [48][49].
The basic version of the database should be created empirically in a limited area and then gradually expanded by the users of the application.Errors occurring on the route are divided into three categories [50]: conceptual errors, design errors, operating errors The list of errors is also open, i.e. it is not de nitively de ned.This is related to the changes that take place in the functioning of transport systems, both from the infrastructure side, as well as issues related to the means of transport, organization and management, etc. Table 1 presents a fragment of the database of identi ed errors [51].The proximity of technical infrastructure elements, among others, elements of bridges, trestle bridges, high-voltage and telecommunications lines, sewage, gas heating and water supply pipes, drainage devices, etc.
x p5 Poor visibility at nodes/intersections -obstacles x p6 Cross-section of the Road that is not adapted to the function it performs x pn … Source: Authors' own elaboration.
The creation of a database with routes, and then with errors assigned to them, constitutes the basis for the further clustering process of infrastructure sections.The clustering was developed on the basis of three stages [48].
Stage I -creating a database of sections with errors assigned to them This stage starts with selection of the route , based on the route database.This route should then be divided into sections of equal length .

1
, where: -the analyzed route, -the section of the route where , -number of route sections.
Due to the signi cant differences between the individual classes of roads, we proposed to divide them into motorways along with express roads and other roads of a lower class.This is due to a signi cant difference in both the types of errors occurring and the average number of errors occurring on different road classes.
The subsequent stages of clustering are carried out independently for the two distinguished road classes, however, they are carried out according to the same principles.After the division into classes, errors are determined on the selected route, which forms the basis for creating databases of sections with assigned errors.

,
where: -the set of errors of the route section, -the set of design errors of the route section, -collection of conceptual errors of the route section, -the set of operating errors of the route section.. -design error, -th, where , -conceptual error, -th, where , -operating error, -th, where .

Stage II -designation of preliminary clusters
The second stage of clustering consists in determining preliminary clusters, which are later transformed into the nal clusters.Each cluster (both preliminary and nal) consists of an initiating section , and sections attached to it.Its construction is carried out as follows: 1.The number of errors that occur on each section of the route is calculated, and then the frequency of identi ed errors is determined according to the following formula: 6 , where: -the frequency of error occurrence within the route [-], -the number of errors of the same type occurring on all analyzed sections, -number of route sections.
This frequency is determined for every group of errors.
2. After performing the calculations, the condition is checked whether there are errors with frequency on a given section of the route that is greater than or equal to 0.8.If so, the set of such errors (single or multielement) is de ned as a set of constant errors.Constant errors are not taken into account in the subsequent part of the clustering, 3. The next stage is to create a list of hierarchically arranged sections of the route in relation to the number of errors on the respective section, excluding constant errors.The section with the highest number of errors occupies the highest place on the list.If there is more than one section with the same number of errors, the section placed higher on the list is the one with the higher sum of the error frequencies that were identi ed on it.

,
where: -the frequency of design error occurrence within the route, -the frequency of conceptual error occurrence within the route, -the frequency of operating error occurrence within the route, -number of design errors of the same type identi ed within the route, -number of conceptual errors of the same type identi ed within the route, -number of operating errors of the same type identi ed within the route.
4. After the list is created, a section (the highest ranking) is selected , which is a potential section initiating the creation of a cluster.After selecting the initiating section, there is a procedure for appending subsequent sections of the route to the newly formed cluster.For each section of the route, the coe cient of error overlap (WPB) and the level of correlation (PK) in relation to the initiating section are calculated.
The coe cient of error overlap is the ratio of the number of errors occurring simultaneously on the appended section and on the initiating section to the number of all errors of the appended section.
The method of determination is presented in Table 2.The number of errors occurring simultaneously on the appended and the initiating section (marked in green) is 4, while the total number of errors on the appended section is 5 (the error that does not occur in the case of the initiating section is marked in red).Thus, the of the appended section is .
The level of correlation is determined on the basis of the following formula: 8 .
If the section meets the criteria of and it is selected as a section capable of co-creating a cluster.

5.
After analyzing all sections of the route, a decision is made to create a preliminary cluster.The condition for its creation is the addition of at least two sections of the route to the initiating section.If this condition is not met, a two-element cluster shall be dissolved.The initiating section goes to the cluster of dispersed sections, while the attached section returns to the database of route sections.If the preliminary cluster creation condition is met, the preliminary cluster is added to the preliminary clusters database.
As long as there are more than two sections remaining in the database, the procedure is repeated.However, if there are no more sections in the database or their number is less than three, the process ends and any remaining sections are transferred to the cluster of dispersed sections.

Stage III -nal cluster database
The third stage of clustering aims to create the nal database of clusters.To this end, the following steps are carried out: 1.A list of sections in the preliminary clusters is created, excluding initiating sections and sections from the dispersed sections cluster.
W P B = 0,8 2. Then the process of relocation of sections is carried out.It consists in rechecking all the sections from the list in relation to the initiating sections according to the altered criterion , while maintaining the previously de ned criterion concerning .In the relocation process, for each sections from the list, is calculated in relation to all initiating sections.The section joins the initiating section for which its is the highest, while meeting the condition of .
3. After analyzing all sections from the list, the list of newly created clusters is created.Each of the clusters is checked according to the condition from the second stage (the creation of the cluster is conditioned by adding at least two sections of the route to the initiating section).If this condition is met, the cluster is placed in the database of preliminary clusters, while if the condition is not met, the remaining sections are transferred to the cluster of dispersed sections.
4. After analyzing all newly created clusters, the nal cluster database is created.
Determination of the reliability class for individual clusters is carried out according to the following principles.The rst step is to create a list of initiating sections of the considered clusters.Then, for each section, the reliability level is calculated according to the following formula: -reliability level of the calculated initiating section, -number of errors identi ed within the route section, -the sum of all conceptual, design and operational errors.
After calculating the reliability level of the initiating segment, its value is assigned to the entire cluster it represented.Then, the cluster reliability level value is compared with the designated con dence level interval, on the basis of which a reliability class is assigned to the cluster.Once the process of assigning the reliability levels covers all clusters from the database, the nal step is to create a nal cluster list with the calculated reliability levels.

Results and Discusion
The veri of the proposed method was based on a 460 km route (in Poland, between Kalisz and Szczecin), which was then divided into 92 sections of 5 km each.These sections were assigned to eleven groups.The rst ten groups consist of 9 sections each, and the last group consists of the remaining two sections.According to the methodology, the route was divided into motorways and expressways, and other roads.
A fragment of the database containing identi ed errors on the route is presented in Table 3.
Table 3 Fragment of the database concerning identi ed errors on the selected route section No. Section FROM Section TO In the result of the clustering process, three sections initiating the cluster were designated for sections containing roads of the class of motorways and expressways, along with eight initiating sections (Table 4) for sections containing other roads (i.e.other than motorways and expressways).Source: Authors' own elaboration.
On the basis of calculations regarding the frequency of errors on the analyzed route, constant errors for different road classes were determined.The results of the calculation of the frequency of constant errors for sections containing other roads (other than motorways and expressways) are presented in Fig. 1.
In the case of motorways and expressways, none of the errors reached the level of 0.8, so constant errors were not found.For roads with a class lower than motorways and expressways, four errors were found (x k1 , x k2 , x k3 , x e2 ).
As a result of the procedure of connecting sections of the route to the initiating cluster, 13 clusters were created, of which 4 with sections representing roads of the class of motorways and expressways, including one cluster of dispersed sections and 9 clusters containing sections of the lower class, including one cluster of dispersed sections.An example of clusters designated for sections with a class lower than motorways and expressways is presented in Table 5. Assessment of the reliability of the road infrastructure for the transport system The reliability route in relation to the designated clusters.For this purpose, reliability classes with assigned reliability levels were elaborated.
Based on the guidelines presented in the methodology, the reliability classes for individual clusters were developed.The results are presented in Table 6.Reliability classes were determined on the basis of [46], where Cl.1 is the highest reliability class.

Conclusion
The substantive basis of the application for assessing the reliability of road infrastructure presented in this publication was based on the sections clustering process.The advantage that the clustering process offers is that it facilitates the analysis of the operational reliability of the infrastructure.This is due to the fact that it eliminates the need to analyse all sections of the route in order to determine its reliability level.
Instead of individual sections, only the initiating section of the respective cluster is analyzed, and the sections connected to it, in accordance with the adopted matching criteria, are deemed identical.This type of abridging of the computational process affects the speed of operation of the application based on the proposed methodology.
The proposed approach, consisting in grouping sections into clusters and then determining risk measures in relation to the respective clusters, instead of determining one global risk value for the entire route, allows for a decision-making process aimed at reducing the risk of failure to perform a transport task.
The conducted experiment consisting in recording of the route and its subsequent analysis con rmed the implementation potential of the proposed methodology, which was the objective of the present work.The obtained results indicate that there are a number of errors on the road infrastructure, including errors with a high frequency of occurrence, i.e. no shoulder/emergency lanes, vegetation in the immediate vicinity of the road, no bypasses/transit roads passing through built-up areas, vegetation overlapping the roadside.
We performed an assessment of the reliability of road infrastructure on the basis of the designated clusters.

Table 1
Fragment of the database on road infrastructure errors e1 Poor condition of the road surface, e.g.: ruts, bumps, cracks, incorrectly made patches, cavities on the pavement, etc.

Table 4
Initiating sections for roads other than motorways and expressways

Table 5
The presented example of a cluster concerns roads that are not motorways nor expressways, for which four xed errors x k2 , x k3 , x k1 , x e2 (marked in blue) were been determined with frequencies of 0.93, 0.93, 0.91 and 0.89, respectively.The starting section was the one with the number V.2.located 185 km from the starting point.Three sections were joined with it: