1 Introduction

The transport system consists of several areas, including: material and technical related to road infrastructure and means of transport, economic and organizational, institutional and legal, spatial and functional, and others. Therefore, determining a universal measure of reliability for the entire system is a very complex challenge that is threatened with numerous disruptions.

Such disruptions include, for example, road accidents and collisions [1], the occurrence of which is stochastic in nature [2,3,4]. It is not possible to accurately determine the place, time and type of an accident or road collision. However, it is possible to indicate places within the road infrastructure, where there is an increased risk of adverse events [5]. These places, depending on the research methodology, are defined as high-risk zones [6], danger zones [7], black sections, black spots [8,9,10], etc.

There were many methods developed to determine the location of individual elements of the transport system [11], including black spots, whether on urban roads [12] or extra-urban roads [13]. The database on the basis of which the research is conducted is the main disadvantage of these solutions. It must contain information on adverse events that occurred in the area covered by the analysis. This makes it possible to designate hazardous zones only after a certain period of road use and following a certain number of accidents and collisions [14]. In addition, with data only available on the number of participants in accidents and collisions at a specific time without accurate reports on the nature and type of adverse events, we remain at risk of erroneous designation of such zones. This occurs when, as a result of, for example, a weather anomaly, several accidents occur on 1 day or many vehicles participate in a single road event [15], which in turn will lead to exceeding the threshold number of road events and designating a high-risk zone at this place. Another problem that is often not taken into account is the complexity of factors affecting the occurrence of an adverse event, e.g. related to the reliability of means of transport [16].

In the result of such situations, the risk of non-performance of the transport task increases. Therefore, improving transport safety is one of the most important objectives of the EU transport policy, which is aimed at meeting the expectations of modern society, related to increasing mobility and improving the quality of life, especially following the COVID-19 pandemic [17].

Making decisions on shaping the transport system requires information support [18,19,20]. Correct utilization of collected road safety information [21] can support the design and subsequent operation of the transport system, ensuring a high level of its reliability [22,23,24,25].

In connection with the above, there is also the need to revise the existing state of knowledge and develop a methodology for designing and analyzing the operational reliability of the transport system, which will create the possibility of a systemic review of safety issues [26].

Road Infrastructure Safety Management (RISM) [27,28,29], which determines the operational reliability of the transport system [30] by separating dangerous road sections, is therefore of strategic importance.

Directive 2008/96/EC of the European Commission is a document that sets out the directions for action consisting in uniform and integrated management of road infrastructure safety in the EU [31]. The abovementioned Directive identifies four tools recommended for use in RIS Management procedures:

  1. 1.

    Assessment of the impact of the planned road on road safety in the network of cooperating roads (Road Safety Impact Assessment),

  2. 2.

    Identification of hazardous sections (Safety Ranking Audit),

  3. 3.

    Identification of road sections with a high accident concentration (AC Classification) and road network safety ranking (NS Ranking),

  4. 4.

    Road infrastructure safety inspection (Safety Inspections).

The requirements contained in the Directive form the basis for road safety tests, which include:

  1. 1.

    Selection of methods for identifying dangerous sections on the existing road network (black spots and black sections) [32].

  2. 2.

    Identification of dangerous spots and/or sections.

  3. 3.

    Risk assessment for selected infrastructure segments [33].

  4. 4.

    Ordering of dataset on dangerous segments of infrastructure by grouping them.

  5. 5.

    Analyzing the results of the ordering performed.

The methodology of operational reliability analysis of transport infrastructure is of particular significance, as it allows for conducting research in the scope of irregularities (errors) occurring in the infrastructure and grouping individual sections into clusters [34,35,36].

There are various known methods for analyzing and processing information in the field of transport systems [37]. The effectiveness of cognitive processes aimed at studying possible events and phenomena that are able to disrupt the course of the transport process, including, for example, in cargo processes in freight transport, is increasing significantly [38]. Every attempt to broaden the information [39] is based on the procedures of data collection and processing in order to organize (structure) them according to certain parameters, e.g. the level of threats to traffic flow, probability of road accidents, etc. In order to organize the collected data, they are clustered [40,41,42,43] or classified [44,45,46,47], which allows to take into account the requirements for information processing needed for the subsequent design of the application.

The presented considerations lead to the development of a concept that will allow for the assessment of the reliability of road infrastructure not only for the use of the authorities managing it, but also for transport companies and individuals. This concept constitutes the substantive basis for the development of an application that will enable the assessment of a selected route (divided into sections) in terms of its reliability.

The next part of the article consists of the methodology, which includes an excerpt from the road infrastructure error database and a three-step procedure scheme. The next part, Results and Discusion, contains a verification of the methodology presented on the basis of the conducted field experiment. The last part of the paper contains conclusions drawn from the conducted research, limitations of the proposed methodology and directions for further research.

2 Methodology

The concept of the application for assessing the reliability of road infrastructure was based on an open - editable database. This database should contain the set of routes, along with an analysis of errors occurring on them, creating an image of the entire transport system in material, technical and spatial, and functional terms [48, 49].

The basic version of the database should be created empirically in a limited area and then gradually expanded by the users of the application. Errors occurring on the route are divided into three categories [50]:

  • conceptual errors,

  • design errors,

  • operating errors

The list of errors is also open, i.e. it is not definitively defined. This is related to the changes that take place in the functioning of transport systems, both from the infrastructure side, as well as issues related to the means of transport, organization and management, etc. Table 1 presents a fragment of the database of identified errors [51].

Table 1 Fragment of the database on road infrastructure errors

The creation of a database with routes, and then with errors assigned to them, constitutes the basis for the further clustering process of infrastructure sections. The clustering was developed on the basis of three stages [48].

2.1 Stage I – Creating a database of sections with errors assigned to them

This stage starts with selection of the route T, based on the route database. This route should then be divided into sections of equal length Om.

$${T}_p=\left\{{O}_m:m=1,2,\dots, M\right\},$$
(1)

where:

Tp:

the analyzed route,

Om:

the section of the route where m = 1, …, M,

M:

number of route sections

Due to the significant differences between the individual classes of roads, we proposed to divide them into motorways along with express roads and other roads of a lower class. This is due to a significant difference in both the types of errors occurring and the average number of errors occurring on different road classes.

The subsequent stages of clustering are carried out independently for the two distinguished road classes, however, they are carried out according to the same principles. After the division into classes, errors are determined x on the selected route, which forms the basis for creating databases of sections with assigned errors.

$$XO= XP\cup XK\cup XE,$$
(2)

where:

XO:

the set of errors of the route section,

XP:

the set of design errors of the route section,

XK:

collection of conceptual errors of the route section,

XE:

the set of operating errors of the route section.

$$XP=\left\{{xp}_i:i=1,2,\dots, {LB}_{xp}\right\},$$
(3)

where:

xpi:

design error, i-th, where i = 1, …, LBxp,

$$XK=\left\{{xk}_j:j=1,2,\dots, {LB}_{xk}\right\},$$
(4)

where:

xkj:

conceptual error, j-th, where j = 1, …, LBxk,

$$XE=\left\{{xe}_k:j=1,2,\dots, {LB}_{xe}\right\},$$
(5)

where:

xek:

operating error, k-th, where k = 1, …, LBxe.

2.2 Stage II – Designation of preliminary clusters

The second stage of clustering consists in determining preliminary clusters, which are later transformed into the final clusters. Each cluster (both preliminary and final) consists of an initiating section OI, and sections attached to it. Its construction is carried out as follows:

  1. 1.

    The number of errors that occur on each section of the route is calculated, and then the frequency of identified errors is determined according to the following formula

    $${C}_s=\frac{LB_s}{M},$$
    (6)

    where:

    Cs:

    the frequency of error occurrence within the route [−],

    LBs:

    the number of errors of the same type occurring on all analyzed sections,

    M:

    number of route sections.

    This frequency is determined for every group of errors.

  2. 2.

    After performing the calculations, the condition is checked whether there are errors with frequency on a given section of the route that is greater than or equal to 0.8. If so, the set of such errors (single or multi-element) is defined as a set of constant errors. Constant errors are not taken into account in the subsequent part of the clustering,

  3. 3.

    The next stage is to create a list of hierarchically arranged sections of the route in relation to the number of errors on the respective section, excluding constant errors. The section with the highest number of errors occupies the highest place on the list. If there is more than one section with the same number of errors, the section placed higher on the list is the one with the higher sum of the error frequencies that were identified on it.

    $${\sum}_{x=1}^{LB_s}{C}_s={\sum}_{xp_i=1}^{LB_{xp}}{C}_{xp i}+{\sum}_{xk_j=1}^{LB_{xk}}{C}_{xk j}+{\sum}_{xe_k=1}^{LB_{xe}}{C}_{xe k},$$
    (7)

    where:

    Cxpi:

    the frequency of design error occurrence within the route,

    Cxkj:

    the frequency of conceptual error occurrence within the route,

    Cxek:

    the frequency of operating error occurrence within the route,

    LBxp:

    number of design errors of the same type identified within the route,

    LBxk:

    number of conceptual errors of the same type identified within the route,

    LBxe:

    number of operating errors of the same type identified within the route.

  4. 4.

    After the list is created, a section (the highest ranking) is selected OI, which is a potential section initiating the creation of a cluster. After selecting the initiating section, there is a procedure for appending subsequent sections of the route to the newly formed cluster. For each section of the route, the coefficient of error overlap (WPB) and the level of correlation (PK) in relation to the initiating section are calculated.

    The coefficient of error overlap is the ratio of the number of errors occurring simultaneously on the appended section and on the initiating section to the number of all errors of the appended section.

    The method of determination WPB is presented in Table 2. The number of errors occurring simultaneously on the appended and the initiating section (marked in green) is 4, while the total number of errors on the appended section is 5 (the error that does not occur in the case of the initiating section is marked in red). Thus, the WPB of the appended section is \(\frac{4}{5}=0,8\).

    Table 2 WPB Calculation example

    The level of correlation PK is determined on the basis of the following formula:

    $$PK=\frac{LB_d}{LB_I}.$$
    (8)

    If the section meets the criteria of WPB ≥ 0, 75 and PK ≥ 0, 5 it is selected as a section capable of co-creating a cluster.

  5. 5.

    After analyzing all sections of the route, a decision is made to create a preliminary cluster. The condition for its creation is the addition of at least two sections of the route to the initiating section. If this condition is not met, a two-element cluster shall be dissolved. The initiating section goes to the cluster of dispersed sections, while the attached section returns to the database of route sections. If the preliminary cluster creation condition is met, the preliminary cluster is added to the preliminary clusters database.

As long as there are more than two sections remaining in the database, the procedure is repeated. However, if there are no more sections in the database or their number is less than three, the process ends and any remaining sections are transferred to the cluster of dispersed sections.

2.3 Stage III – Final cluster database

The third stage of clustering aims to create the final database of clusters. To this end, the following steps are carried out:

  1. 1.

    A list of sections in the preliminary clusters is created, excluding initiating sections and sections from the dispersed sections cluster.

  2. 2.

    Then the process of relocation of sections is carried out. It consists in rechecking all the sections from the list in relation to the initiating sections according to the altered criterion PK, while maintaining the previously defined criterion concerning WPB ≥ 0, 75. In the relocation process, for each sections from the list, PK is calculated in relation to all initiating sections. The section joins the initiating section for which its PK is the highest, while meeting the condition of WPB ≥ 0, 75.

  3. 3.

    After analyzing all sections from the list, the list of newly created clusters is created. Each of the clusters is checked according to the condition from the second stage (the creation of the cluster is conditioned by adding at least two sections of the route to the initiating section). If this condition is met, the cluster is placed in the database of preliminary clusters, while if the condition is not met, the remaining sections are transferred to the cluster of dispersed sections.

  4. 4.

    After analyzing all newly created clusters, the final cluster database is created.

Determination of the reliability class for individual clusters is carried out according to the following principles. The first step is to create a list of initiating sections of the considered clusters. Then, for each section, the reliability level is calculated according to the following formula:

$${PN}_{OI}=1-\frac{LB_s}{\sum x},$$
(9)

where:

\({PN}_{O_I}\):

reliability level of the calculated initiating section,

LB s:

number of errors identified within the route section,

x:

the sum of all conceptual, design and operational errors.

After calculating the reliability level of the initiating segment, its value is assigned to the entire cluster it represented. Then, the cluster reliability level value is compared with the designated confidence level interval, on the basis of which a reliability class is assigned to the cluster. Once the process of assigning the reliability levels covers all clusters from the database, the final step is to create a final cluster list with the calculated reliability levels.

3 Results and Discusion

The verification of the proposed method was based on a 460 km route (in Poland, between Kalisz and Szczecin – Fig. 1), which was then divided into 92 sections of 5 km each. These sections were assigned to 11 groups. The first 10 groups consist of 9 sections each, and the last group consists of the remaining two sections. According to the methodology, the route was divided into motorways and expressways, and other roads. The routing of the route and individual sections is based on the criteria adopted, i.e. the route should include a minimum of three classes of roads including highways and expressways, the road must be used for freight transport, the road must be of significant economic importance for the regions through which it passes [48].

Fig. 1
figure 1

Route used for verification of the proposed method (Kalisz-Szczecin)

A fragment of the database containing identified errors on the route is presented in Table 3.

Table 3 Fragment of the database concerning identified errors on the selected route section

On the section shown in Tables 3, 6 errors in road infra-structure were identified, while on the entire analyzed route the total number of errors was 346. In a situation if the same error occurred several times on the analyzed section (5 km) it was counted as one.

In the result of the clustering process, three sections initiating the cluster were designated for sections containing roads of the class of motorways and expressways, along with eight initiating sections (Table 4) for sections containing other roads (i.e. other than motorways and expressways).

Table 4 Initiating sections for roads other than motorways and expressways

On the basis of calculations regarding the frequency of errors on the analyzed route, constant errors for different road classes were determined. The results of the calculation of the frequency of constant errors for sections containing other roads (other than motorways and expressways) are presented in Fig. 2 [48].

Fig. 2
figure 2

Graph of the frequency of errors on sections with a class lower than motorways and expressways. Source: Authors’ own elaboration

In the case of motorways and expressways, none of the errors reached the level of 0.8, so constant errors were not found. For roads with a class lower than motorways and expressways, four errors were found (xk1, xk2, xk3, xe2).

As a result of the procedure of connecting sections of the route to the initiating cluster, 13 clusters were created, of which 4 with sections representing roads of the class of motorways and expressways, including one cluster of dispersed sections and 9 clusters containing sections of the lower class, including one cluster of dispersed sections. An example of clusters designated for sections with a class lower than motorways and expressways is presented in Table 5.

Table 5 Example of a designated cluster

The presented example of a cluster concerns roads that are not motorways nor expressways, for which four fixed errors xk2, xk3, xk1, xe2 (marked in blue) were been determined with frequencies of 0.93, 0.93, 0.91 and 0.89, respectively. The starting section was the one with the number V.2. located 185 km from the starting point. Three sections were joined with it:

  • V.9. o WPB = 1 i PK = 1,

  • III.1. o WPB = 1 i \(PK=\frac{1}{2}\),

  • IV.8. o WPB = 1 i \(PK=\frac{1}{2}\) .

3.1 Assessment of the reliability of the road infrastructure for the transport system

The reliability of the route was assessed in relation to the designated clusters. For this purpose, reliability classes with assigned reliability levels were elaborated.

Based on the guidelines presented in the methodology, the reliability classes for individual clusters were developed. The results are presented in Table 6. Reliability classes were determined on the basis of [46], where Cl.1 is the highest reliability class.

Table 6 Infrastructure Reliability Assessment

The results presented in Table 6 allow us to conclude that there is a very wide range of designated reliability classes for the clusters along the analyzed route. This is understandable given that the studied route consists of roads of different categories. The determined classes from points 10 to 12 apply to clusters in which highway and expressway sections were concentrated. The remaining clusters include roads of lower categories, resulting in a visibly lower reliability class.

4 Conclusion

The substantive basis of the application for assessing the reliability of road infrastructure presented in this publication was based on the sections clustering process. The advantage that the clustering process offers is that it facilitates the analysis of the operational reliability of the infrastructure. This is due to the fact that it eliminates the need to analyse all sections of the route in order to determine its reliability level. Instead of individual sections, only the initiating section of the respective cluster is analyzed, and the sections connected to it, in accordance with the adopted matching criteria, are deemed identical. This type of abridging of the computational process affects the speed of operation of the application based on the proposed methodology.

The proposed approach, consisting in grouping sections into clusters and then determining risk measures in relation to the respective clusters, instead of determining one global risk value for the entire route, allows for a decision-making process aimed at reducing the risk of failure to perform a transport task.

The conducted experiment consisting in recording of the route and its subsequent analysis confirmed the implementation potential of the proposed methodology, which was the objective of the present work. The obtained results indicate that there are a number of errors on the road infrastructure, including errors with a high frequency of occurrence, i.e. no shoulder/emergency lanes, vegetation in the immediate vicinity of the road, no bypasses/transit roads passing through built-up areas, vegetation overlapping the roadside. We performed an assessment of the reliability of road infrastructure on the basis of the designated clusters.

The key achievement of the work is the implementation of cluster analysis, the so-called clustering, for assessing the reliability of transport infrastructure. The proposed method makes it possible, among other things, to assess the reliability of transport routes as a whole, as well as by section, which increases the flexibility of the application’s use in commercial purposes.

The limiting element of the proposed methodology is the fact that it concerns a constantly evolving research area, i.e. the transportation system. Consequently, there is a cyclical need to supplement databases. This may affect, among other things, the fixed errors defined in the paper, which will require verification over time.

In further research, the authors will focus on extending the proposed methodology to include further criteria relevant to the reliability of transport infrastructure, e.g. the occurrence of traffic incidents, congestion or infrastructure moedernization works.