Keywords

1 Introduction

Organisations managing a large inheritance of old tunnels and underground structures are confronted with the need to guarantee the full safety of use while optimising their overall maintenance costs. This is particularly critical in railway tunnels, for example, in France, the mean age of railway tunnels is 124 years, with 80 % of them over 100 years of age. For the maintenance of tunnels, diagnosing pathologies is an important reasoning task. Tunnel experts carry out periodic tunnel inspections leading to the evaluation of a tunnel’s global conditions by identification of main pathologies based on possible causes in the form of disorders and diagnosis influencing factors [1]. This is a complex process, prone to subjectivity and poorly scales across cases and domains. To address this problem in the EU project NeTTUN (nettun.org), a DSS called PADTUN, is being developed involving tunnel experts and knowledge engineers. Pathology Assessment and Diagnosis of TUNnels (PADTUN) system is applied in a tunnel diagnosis use case with the French national railway, SNCF.

For tunnel diagnosis, in addition to inferring possible pathologies in individual tunnel portions, it is also important to consider spatial elements, such as inferring continuous tunnel portions (called here ‘regions of interest(ROI)’) with similar types of pathologies. The key challenge is to develop an aggregation mechanism to group together individual portions in larger regions of interest based on a similarity of pathologies. This abstraction is extremely important for efficiency reasons. For example, a two kilometre tunnels with ten meter portions will have 200 portions for tunnel experts to inspect. Hence, an appropriate aggregation resulting in regions of interest and ultimately reducing the number of individual portions to inspect, will facilitate and improve the efficiency of the diagnosis process. The prime driver for building PADTUN is to capture the tacit knowledge required for successful completion of these tasks in order to preserve the knowledge and expertise of very few experts in such organisations. Although the cost of performing these tasks well is very small, maintenance operations and the impact of a tunnel malfunction can be costly and catastrophic.

The PADTUN system is a novel DSS for tunnel diagnosis and maintenance using semantic technologies. PADTUN assists tunnel experts in making decisions about a tunnel’s condition with respect to its disorders and diagnosis influencing factors. PADTUN also allows reviewing regions of interest with similar pathologies. The use of semantics is a very fitting proposition in developing DSS [2]. For example, one of the prominent areas where semantics has been applied is, in making domain knowledge required for making decisions explicit [3]. In our work, PADTUN ontologies are developed and used to model tacit knowledge from tunnel experts. These ontologies capture the existing decision process concerning maintenance of tunnels and provide a context model for automated decision support. The PADTUN ontologies are the first ever ontologies developed for the domain of tunnel diagnosis and maintenance. Another prominent aspect, where semantics are utilised as part of PADTUN development, is fulfilling the requirement of the DSS and decision maker to have access to heterogeneous data. The unique feature of semantic technologies in enabling the fusion of heterogeneous data has been employed in a number of projects [4]. In PADTUN development, heterogeneous data, providing contextual information are annotated with ontologies to take advantage of the inferring capabilities offered by semantic technologies. We use semantics even further and utilise PADTUN ontologies for calculating homogeneous portions in order to identify regions of interest. In particular, semantics plays a key role in detecting continuity by considering semantic similarity between pathologies represented as concepts. With this work, we contribute to semantic web research by applying semantic technologies in urban and infrastructure planning and maintenance, a domain that is starting to receive attention from the semantic web community [5].

Section 2 outlines the technical architecture of the PADTUN system. The two main components of this architecture are described in the following two sections. We carried out an initial evaluation of the system. The evaluation details are described in Sect. 5. We conclude by discussing the findings and outlining the future work.

2 Pathology Assessment and Diagnosis of Tunnels (PADTUN)

Figure 1 depicts the integrated view of the PADTUN system. PADTUN is designed using three-tier architecture consisting of layers for interface, application and data.

Fig. 1.
figure 1

PADTUN system architecture

Data Layer. Infrastructure managers, including SNCF, own and manage inspection databases of tunnels that record the provenance of data related to inspections and any repairs. This data contains information about any disorders diagnosed during inspections and contextual factors. In the case of PADTUN, this inspection data is made available from SNCF’s internal system as XML dumps.

In order for the system to work, PADTUN requires domain-specific knowledge. Encoding and specifying such knowledge in ontologies is one of the main contributions of this paper. The PADTUN ontologies are designed in consultation with the tunnel experts in the project and by extensively reviewing literature on the subject. These ontologies codify knowledge about tunnel disorders, diagnosis influencing factors, lining materials and pathologies. PADTUN ontologies are described in more detail in Sect. 3. The data layer also contains a semantic repository that allows storing the ontologies and performing reasoning. OWLIM was chosen due to scalability reasons [6] as the system is required to reason over a large number of tunnels and inspection data. The system also contains a relational database in the form of MySQL to store inspection and result data for caching purpose.

Processing layer. PADTUN consists of an intelligent processing layer built on a data layer that suggests pathologies per portion and regions of interest, together with explanations.

Three components are included as part of the processing layer. The pathology inferencing component implemented as RESTful service utilises the ontologies and infers a list of pathologies when provided with disorders, lining material and diagnosis influencing factors details for individual tunnel portions. Internally, it infers pathologies based on (i) disorders and lining material and (ii) diagnosis influencing factors and creates a cumulative list. The ROI inferencing component, implemented as RESTful service utilises the ontologies and the output of the pathology inferencing service to infer regions of interest. Both of these services are described in detail in Sect. 4. The Data management component contains business logic to convert XML to DB with the help of a converter, and stores the inspection data as per the new schema dictated by the ontology. In order to achieve conversion, the component consists of a mapping between the schema and the ontologies.

Presentation Layer. This layer consists of a user interface that allows decision makers to interact with the DSS. The interface allows users to upload tunnel inspection data and view and manipulate the results from the pathology and ROI inferencing services. The interface component is implemented using PHP and JavaScript.

Following sections focus on the main component of the architecture, PADTUN ontologies, pathology and ROI inferencing services.

3 PADTUN Ontologies

PADTUN ontologies are developed using METHONTOLOGY [10] methodology. NeTTUN use cases helped us to define scope and purpose of the ontologies and provided a reasonably well-defined target.

Scope & Purpose. The ontologies need to capture the existing decision process concerning the diagnosis of tunnels, to provide a context model for automated decision support. This conceptual model should include disorders observed during the inspections, tunnel common pathologies and diagnosis influencing factors. This knowledge also needs to be classified and linked, in order to identify associations of disorders and diagnosis influencing factors with pathologies.

Knowledge Sources. The ontologies are designed based on the knowledge of experts within the NeTTUN project. To ensure a wide range of use and generality, extensive literature in the area [1, 79] has been consulted.

3.1 Conceptualisation

This activity requires that the domain knowledge is structured in a conceptual model describing the problem and its solution in terms of a domain [10]. We used a number of methods for knowledge elicitation including expert interviews, brainstorming sessions using tools such as IHMC Concept Maps to facilitate the conceptualisation process. Initial conceptualisation focused on the elicitation of the top-level ontology concepts.

Top Level Concepts. Several tunnel type classifications were considered. For instance, tunnels can be classified regarding their operational use, construction method, age and other characteristics. The proposed classification regarding the PADTUN scope is based on an elementary part of a tunnel, an atomic portion, called here tunnel portion. A tunnel portion can be defined as “an elementary part of the tunnel with all the necessary elements that enable a diagnosis to be made” [1, 8]. In this respect, a tunnel portion presents a geology, a geometry, and structural characteristics such as lining and repair features.

A tunnel portion is derived from larger tunnel stretches. Because the scope of the ontologies is maintenance, these larger tunnel stretches have been defined as Tunnel Inspection Stretch, corresponding to tunnel lengths where an inspection has been carried out. This Tunnel Inspection Stretch has one location and has been inspected at least once. Further, within a Tunnel Inspection Stretch, and regarding Geology, one or more Tunnel Geo Stretch can be identified, each one characterized by one single geology. This conceptualisation is presented in a concept map in Fig. 2.

Fig. 2.
figure 2

Concept map with the top level concepts related to tunnel

Pathologies. A pathology is a problem that causes tunnel disorders; it is also the link between the disorders and its causes. Pathologies provoke tunnel degradation, which manifests itself in a combination of disorders, often more than one. Considering tunnel experts’ interviews and literature on the subject, the most common pathologies have been identified and classified according to these degradation processes. These were collected from the experts as a knowledge glossary [10, 11].

Tunnel disorders. Disorders are disturbances in the expected quality level of a tunnel, being subjected to evolution. Disorders are also symptoms of pathologies. A classification of disorders was collected from the experts as a knowledge glossary. The associations between disorders and pathologies were provided as a table (see Fig. 3). There were in total 227 such associations provided by the experts.

Fig. 3.
figure 3

Shows the association between pathologies and disorders with (1) mortar ageing pathology as an example. (2) shows the coded list of lining material that has to be present to manifest mortar ageing (3) shows the disorders i.e. “potentially unstable” (structure) that has to be present to manifest mortar ageing. The coloured cell signifies the typicality of such disorder for this pathology.

Diagnosis Influencing factors. Factors representing all elements influencing tunnel degradation, which are considered by the expert(s) when making decisions. The associations between pathologies and diagnosis influencing factors were provided as a Table. There were in total 78 associations provided by the experts.

3.2 Conceptual Model

The conceptualisation of the domain was converted into OWL ontologies [12]. Figure 4 shows the upper ontology of Tunnel with linkages to other major concepts from the domain model such as Tunnel Types, Tunnel Geo Stretch, and Pathology. The upper level also captures that a Tunnel Portion can have disorders, diagnosis influencing factors, lining materials.

Fig. 4.
figure 4

PADTUN upper ontology.

Figure 5 shows the representation of pathologies and instances based on degradation types. Regarding the causes of the degradation (the origin of the problem), two general groups of pathologies were identified distinguishing based on its origin. They were ground degradation pathologies (if pathologies occur underground) and lining degradation pathologies (if pathologies occur with lining) [7, 8]. Figure 6 depicts how an association between a disorder and a pathology is represented in the ontology. This example shows how the rule provided by the experts in a table (see Fig. 3) is represented in the ontology. Similarly, Fig. 7 illustrates an association between a pathology and a diagnosis influencing factor and other contextual information such as the level of influence.

Fig. 5.
figure 5

Partial representation of pathologies classification based on degradation types.

Fig. 6.
figure 6

Association between a pathology & a disorder (ontological representation of Fig. 3)

Fig. 7.
figure 7

Showing an association between a pathology and a diagnosis influencing factor.

To facilitate the evolution of the PADTUN ontologies, they were developed as a group of smaller but interlinked modular ontologies [13]. Table 1 presents a summary of the ontological features of the PADTUN ontologies with size, expressivity [14], and complexity of the core knowledge captured by axioms. In particular, PADTUN ontologies utilise OWL features such as sameAs, disojointWith, and equivalentClass. The PADTUN ontologies are available from hereFootnote 1.

Table 1. PADTUN ontologies features

4 Pathology and ROI Inferencing Service

Pathology and ROI inferencing services are two central components of the PADTUN application layer.

4.1 Pathology Inferencing

Pathologies are calculated in two steps: (i) by inferring associations between disorders, and pathologies; and (ii) by inferring associations between diagnosis influencing factors and pathologies.

The Disorder - based pathologies component of the pathology inferencing service finds all the pathologies with disorders and lining materials present in the tunnel portion under inspection and ranks pathologies according to the typicality of the disorders. This inference involves SPARQL queriesFootnote 2 to infer associations.

The Diagnosis Influencing Factors - based Pathology component finds all the pathologies for the diagnosis influencing factors present in the tunnel portion under inspection and ranks them according to their influence level. Furthermore, a check is made if all the necessary influencing factors for a pathology are present in the portion under investigation. If they are not, the pathology is removed from the final list and ranking is adjusted accordingly. This inference involves SPARQL queries to infer associations and to check the necessary conditions.

The pseudo code of these two components is presented below. The weights (m and n in the pseudo code) were set by series of interaction with the experts. The values m = 4 and n = 1 were found to be the best according to experts’ judgement based on three tunnels. We validated this further with seven tunnels and the values were found to be suitable without further adjustments.

The Cumulative pathologies component combines the results of the previous two components by aggregating the score of pathologies in both the lists (disorder-based pathology list & influencing factor-based pathology list).

4.2 Regions of Interest (ROI) Inferencing Service

One of the decision support aspects of the PADTUN is to identify regions of interest concerning pathologies. In practice, tunnel experts intuitively abstract regions of interest and in doing so aggregate tunnel portions that are susceptible to the same types of pathologies with some distance approximation. However, it was not clear from the outset how the experts themselves infer ROIs once pathologies per portion were identified. Hence, a mock-up of several possible alternatives was presented to the experts in order to identify the best way of inferring ROIs. We here present the logical formalism for these alternative ways to define and calculate ROIs.

Let’s say Top n ranked pathologies per individual portion of a tunnel are denoted by observation, obs(P) . Then a region of interest R is a continuous homogeneous portion of the tunnel consisting of a set of individual tunnel portions ( P ). The granularity of continuity is determined by how big gap (n) between adjacent tunnel portions is allowed.

In addition, homogeneity in an ROI can be determined by the validity of a logical expression Φ(X) that is applied to portions X of a Tunnel. The aggregation predicate R Φ,n(X) is

$$ {\text{R}}_{{\Phi , {\text{n}}}} \left( {\text{X}} \right)\, \equiv \,\forall (x_{i} \in X)\;\,\exists \,(x_{\text{j}} \in X)\,\left[ {x_{{{\text{i}} }} \, \ne \,x_{\text{j}} \, \wedge \,\Phi \left( {\left\{ {x_{\text{i}} ,x_{\text{j}} } \right\}} \right) \wedge {\text{dist}}(x_{\text{i}} , x_{\text{j}} ) \, \le \,{\text{n}}} \right] $$

Where, Φ(X) is one the following predicates which specifies different possible conditions as to when two tunnel portions can be aggregated:

Portions with (Approximately) Equal Observations (Φ=, Φ ). Observations under consideration are deemed ‘equal’ when they share the same pathologies. For two portions p 1 and p 2, Φ= is defined as: Φ=({p 1, p2}) ≡ obs(p 1) = obs(p2). Observations are ‘approximately equal’ if all their pathologies are semantically similar:

$$ \begin{aligned} \varPhi_{ \approx } \left( {\left\{ {p_{1} ,p_{2} } \right\}} \right)\, \equiv \, & \left[ {\forall \left( {o_{1} \in obs\left( {p_{1} } \right) \to \exists (o_{2} \in obs\left( {p_{1} } \right)} \right)\;similar\;\left( {o_{1} ,o_{2} } \right)} \right] \wedge \\ & \left[ {\forall \left( {o_{2} \in obs\left( {p_{2} } \right) \to \exists (o_{1} \in obs\left( {p_{2} } \right)} \right)\;similar\;\left( {o_{1} ,o_{2} } \right)} \right] \\ \end{aligned} $$

Portions with (Approximately) Incorporating Observations (Φ, Φ). One observation ‘incorporates’ another observation if it contains all the pathologies that the other observation has, i.e. Φ ({p 1, p2}) ≡ (obs(p 1) ∩ obs(p2)) ∈ {obs(p 1), obs(p2)}. Also, one observation is a ‘approximately incorporating’ another observation if there exists some set of concepts in one that are semantically similar to another so that one set of observations contain all the observations that the other observation has, i.e.

  • Φ({p 1, p2})  ≡ [∀(o 1obs(p 1) → ∃(o 2obs(p 2)) similar(o 1, o 2)]

Portions with (Approximately) Overlapping Observations (Φ, Φ ). One observation ‘overlaps’ another observation if it contains only some pathologies that the other one has and vice versa: Φ({p 1, p2}) ≡ (obs(p 1) ∩ obs(p2)) ≠ ∅ ∧ ¬ Φ({p 1, p2}). Also, one observation ‘approximately overlaps’ another observation if it contains some concepts (e.g. disorders) that are semantically similar to the concepts from the other observation and vice versa is also true, i.e.

  • Φ({p 1, p2}) ≡ [∃(o 1 ∈ obs(p 1) ∃(o 2 ∈ obs(p 1)) similar(o 1, o 2)] ∧

  • [∀ (o 2 ∈ obs(p 2) → ∃(o 1obs(p 2)) [similar(o 1, o 2)] ] ∧ ¬ Φ({p 1, p2})

Portions with the Same Classification (ΦC ). Two observations belongs to the same classification if they both contain pathologies belonging to the same ontology class.

$$ \Phi_{\text{C}} \left( {\left\{ {p_{ 1} ,{\text{p}}_{ 2} } \right\}} \right) \equiv \left( {obs\left( {p_{ 1} } \right) \in {\text{C}} \wedge obs\left( {{\text{p}}_{ 2} } \right)} \right) \in {\text{C}}). $$

Example. Consider a tunnel (see Fig. 8) with ten tunnel portions. The observations consisting of pathologies on each of these ten portions are given in the figure with O = {d i …, d n }; where d 1  = Mortar Ageing; d 2  = Dissolution; d 3  = Creep; d 4  = Faults Degradation; d 5  = Rock Weathering and d 6  = Swelling. It is also given that d2 and d6 are semantically similar, i.e. similar (d 2 , d 6 ). A domain expert can then tailor what he would like to view as region of interest by manipulating two criteria from the aggregation function: (i) allowed gap(n) and (ii) predicate (Φ(X)) to use. Figure 8 shows various ROIs under different selections. For example, when the selection is n = 1 and the predicate for portions with equal observations (Φ=) is selected (first row, Fig. 8), the resultant eight ROIs are: {{p1, p2}, {p3}, {p5}, {p6}, {p7}, {p8}, {p9}, {p10}}.

Fig. 8.
figure 8

Result of various selections of aggregation predicates and gap. Resultant ROIs are numbered and shown as aggregation of individual portions.

A different selection (last row, Fig. 8), by keeping n = 1 but changing the predicate to \( {\text{R}}_{{\Phi_{\text{C}} }} \) reduces number of ROIs to one, i.e. {{p5, p6, p7, p8}}. Each portion in this ROI belongs to the Ground Degradation Pathology class from the PADTUN ontology. The ontological representation of this portion is depicted in Fig. 9.

Fig. 9.
figure 9

Ontological representation of one of the resultant ROIs (selection: n = 1 and \( R_{{\varPhi_{C} }} \)).

Finalising Aggregation Function(s) to Implement. Experts were shown a mock-up of ROIs with different selections (above). The aggregation function Portions with the same classification ( \( R_{{\varPhi_{C} }} \) ) was deemed to be most useful for decision-making and was implemented for the final version of the ROI inferencing service. Detecting regions with portions that have pathologies belonging to the same classification helps decision makers to decide on an overall approach they can take while addressing problematic tunnel regions. Grouping affected regions according to the pathology classification is helpful in making decisions about expertise, treatment and equipment required for maintenance. For example, infrastructure managers are required to send different equipment to repair lining degradation pathologies from the one needed to fix ground degradation pathologies. Similarly, it will require different skillsets to repair different type of pathologies.

5 Evaluation

Overall Set-up. An evaluation was carried out to verify the correctness of the PADTUN components, namely the ontologies, pathology and ROI Inferencing components. The goal of the evaluation was to discover any issues and to identify improvements in these components. In addition, it was also important to check the correctness of the input we received from the experts. Experts provided rules as tables indicating the situations under which a particular pathology is likely to occur. The cumulative effect of these rules and whether they match the experts’ tacit judgment about pathologies is something else we aimed to capture during the evaluation.

The evaluation was conducted with tunnel experts from the project with extensive experience in diagnosing tunnels and strategic decision-making about tunnel maintenance.

Pathology Inferencing Evaluation & Results. Figure 10 shows the partial interface for the pathology inferencing component of the PADTUN system. The columns “rank” and “pathology” shows the rank of the pathology. The “disorders” column shows the disorders that were present in the tunnel portion under investigation and contributed to manifesting this pathology. The colour coding shows whether the disorders are typical disorders for the pathology.

Fig. 10.
figure 10

The PADTUN interface for the pathology inferencing service. It shows results of “disorders-based pathology inferencing” on a tunnel portion.

For this evaluation, 41 portions of 3 tunnels were selected by consulting the experts. The aim was to select tunnel portions with a good variety of disorders. The experts were provided the output of the pathology inferencing service as part of the interface (Fig. 10). They were asked to comment on individual (disorder and diagnosis influencing factors based) pathology inferencing and cumulative pathology inferencing results.

The experts approved the presence of the pathologies and their ranking in all the test cases for the individual (disorder and diagnosis influencing factors based) pathology inferencing. However, during discussions it became evident that although they agreed with the individual inferencing they were not satisfied with the cumulative calculations. We discovered that the pathologies were correctly calculated based on disorders and diagnosis influencing factors and according to the rules encoded. However, in their tacit calculations, experts always expected a pathology to be present in both the lists for them to consider the pathology in the cumulative list. As a result of this exercise, this cumulative list rule was added to the ontology and to the pathology inferencing service. This scenario highlights the need of domain expert involvement in testing ontologies and the resultant benefit in terms of ongoing knowledge expansion.

ROI Inferencing Evaluation & Results. A gold standard consisting of 3 tunnels and respective ROIs was collected from tunnel experts. These three tunnels have a different number of portions. The tunnel 1 is one of the smallest tunnels with 19 portions but a higher number of pathologies. The tunnel 2 has 35 portions with some portions without any pathology. The tunnel 3 has 42 portions and a good mix of lining disorders and pathologies. The evaluation included these 96 portions. For each of these tunnels, experts provided ROIs based on pathology classification. For example:

“Portions 1 to 3 in tunnel 1 have pathologies from Lining Degradation classification; Portions 1 to 19 in tunnel 1 have pathologies from Lining Ageing degradation classification.”

Figure 11 depicts the PADTUN interface showing the overview of pathologies across tunnel portions and highlighting ROIs.

Fig. 11.
figure 11

The PADTUN interface showing the overview of pathologies across tunnel portions. (1) shows the regions of interest with pathologies from the same classification, e.g., (2) Lining Degradation and (3) Lining Ageing.

The output of the ROI Inferencing was compared with the gold standard using traditional IR approach of precision, recall and F-Measure [15]. True positives (tp - exact matches from the system list and gold standard), false positives (fp - the system indicated ROIs that were not in the expert list), true negatives (tn- possible ROIs that were not present in either of the lists) and false negatives (fn- a region that was not present in the system list but was present in the gold standard) were calculated.

The result is summarised in Table 2. Three configurations of the ROI inferencing service are considered. In the first one, for a classification to be considered at least one pathology from the classification has to be in the top three ranks. The second configuration is more restrictive and expects at the least two pathologies from a classification in the top three ranks for the classification to be considered. In the final configuration, if any ROI has three or fewer portions then the ROI is discarded from the analysis ensuring that ROIs contain a substantial number of portions for the analysis.

Table 2. Precision, recall and F-measure results for the ROI inferencing component; (left) considering ROIs with at least one pathology from a classification in the top 3 ranks; (middle) considering ROIs with min 2 pathologies from a classification in the top 3 ranks; (right) considering ROIs with at least three portions.

The configuration with the rule that at least two pathologies of a classification need to be present in an ROI achieved the highest result in all three criteria. F-Measure was 84 %. Under an interpretation of the agreement between expert list and system list of ROIs, this is considered to be “an almost perfect agreement” [16]. The configuration restricting cut-off number of portions per ROI achieved similar performance. The least restrictive configuration fared worst with 54 % precision.

6 Conclusions and Future Work

In this paper, we have demonstrated the application of semantic web technologies in a new domain of tunnel diagnosis and maintenance. A DSS system, PADTUN, is presented that supports tunnel experts with decision-making about diagnosing pathologies and detecting continuous portions with similar pathology spread. This was only possible with semantic web technologies as the aggregation mechanism requires semantic reasoning over pathology classification. Use of semantic technologies makes the framework flexible where the domain experts can select larger and more granular portions with different configuration including selecting portions with similar pathologies in top ranks and ignoring short gaps. This flexibility allows us to work with the experts to select an ideal configuration, which is in our plans for immediate future work.