Introduction

Asset management and its relationship with water infrastructure modelling

Significant investments are necessary for the construction, management, and maintenance of water supply and distribution network (WSDN) infrastructure to ensure access to drinking water that meets the required standards in terms of quality and quantity for the population. These investments introduce an additional layer of complexity, encompassing various technical, economic, social, and environmental aspects within the system (Sitzenfrei et al. 2020; Kerwin and Adey 2021; Meijer et al. 2021). The foreseen deadlines for the components' end-of-life in WSDNs underscore the imperative for enhanced planning and efficient resource allocation to mitigate service level deterioration and infrastructure conditions. These concerns carry significant global implications, threatening the overall quality of provided services (Zamenian et al. 2017; Bello et al. 2019; Curt et al. 2019; Vieira et al. 2020).

The repercussions of WSDN deterioration processes, including pipe breaks, water losses, substandard service delivery, compromised water supply quality, and escalating operational and maintenance expenses, are well-documented (Bello et al. 2019). Water distribution pipes play a crucial role in water supply and distribution systems, and their failures have significant financial, social, and environmental ramifications (Fan et al. 2021). The pipe replacement rate by drinking water companies ranges from 0.5 to 4.8% per year, exemplified by the average annual renewal of 0.67% of the network length in the case of France (Office Français de la Biodiversité 2022). In comparison, the USA exhibits an average rate ranging from 1 to 4.8%, representing a one-time renewal rate. Consequently, it would take approximately 20 to 200 years to renew the current water infrastructures worldwide fully. Table 1 summarizes the infrastructure and renewal rates, water losses, leakage, and non-revenue water in various countries' WSDNs. The deterioration of pipelines significantly impacts WSDNs globally, evident in metrics such as water losses and the associated costs, insufficient network renewal results in deteriorating pipeline conditions and increased system failures.

Table 1 Status of WSDNs at the national level in terms of infrastructure and renewal rates, water losses and leakage, and non-revenue water

The challenges posed by water infrastructure necessitate a resilient and interdisciplinary approach to implementing an asset management system. Such a system encompasses a cohesive network of elements within an organization, defining the asset management policy, establishing asset management objectives, and formulating the necessary processes to achieve these objectives (ISO/TC 2512014).

Asset management systems are intricately linked to water infrastructure management, operating within the framework of Infrastructure Asset Management (IAM). IAM encompasses essential activities geared towards optimizing service delivery, including inventorying, monitoring, maintenance, and renovation. This holistic approach seamlessly integrates engineering and management sciences, aligning technical aspects with usage, perception, and value considerations. Such integration enhances informed decision-making and facilitates the development of efficient management strategies. By intertwining engineering sciences with management disciplines, IAM is pivotal for Water Utilities in ensuring the long-term maintenance and adaptability of water infrastructure, effectively addressing ageing and potential obsolescence (Le Gat et al. 2023).

Modelling and optimization in the operation of structures, coupled with probabilistic modelling of structural deterioration (performance failure), represent significant and forward-looking trends as underscored by Le Gat et al. (2023), shaping the strategic objectives of Water Infrastructure Asset Management (WIAM). These trends align with the findings of other prominent authors who have extensively researched IAM. Notable researchers in recent years include Ugarelli and Sægrov (2022), Okwori et al. (2021), Pathirana et al. (2021), El-Diraby (2021), Mazumder et al. (2021), Beuken et al. (2020), Curt et al. (2019), and Carriço et al. (2020), among others in recent years. These researchers note the adaptation of WIAM frameworks to embrace digitization and the implementation of modelling in diverse domains, as expounded in detail by Okwori et al. (2021) and Kerwin and Adey (2021). As a result, WIAM is evolving and paving the way for sustainable water infrastructure practices by incorporating asset modelling.

Modelling in water infrastructure asset management: significance and diverse outputs at various levels

Models play a crucial role in WIAM as they enable a systematic approach to understanding and optimizing the operation of water infrastructure systems. They provide valuable insights into the complex dynamics of these systems, helping to enhance decision-making processes and improve overall performance (Garzón et al. 2022).

Modelling is vital in comprehending the complex dynamics of water distribution systems, involving the intricate interplay between water, the environment, and the ageing infrastructure of WSDN. Employing data-driven models facilitates the extraction of meaningful insights from data patterns, capturing variable relationships without predetermined mechanisms or interactions. In the context of WIAM-focused WSDN modelling, historical datasets of leaks and pipe failures serve as the foundation to explore factors influencing pipe lifespan. These influential factors, termed explanatory variables, constitute this paper's primary focus of investigation.

WIAM relies on models that perform several critical roles, such as prediction; models can estimate the performance of water infrastructure assets over time. For instance, the modelling approach in this study considers age, material, environmental and social conditions, among other factors, to predict the likelihood of asset failures, enabling the implementation of proactive maintenance and replacement strategies and mitigating potential issues. Optimization, using models, can enhance the operation and maintenance of water infrastructure assets (Ulusoy et al. 2021). Models can identify the most efficient asset management strategies by assessing cost, resource availability, sustainability metrics, and system performance.

Risk Assessment, through modelling, allows for evaluating potential risks related to various asset management decisions. These models can measure the potential outcomes of asset failures and pinpoint areas of significant vulnerability, thereby directing efforts towards mitigating risk (Ugarelli and Sægrov 2022). Models facilitate scenario analysis, allowing decision-makers to explore different “what-if” scenarios and their potential impacts, helping to evaluate the effectiveness of different strategies and identify the best course of action (Rulleau et al. 2020).

Regarding the types of output provided by models at different levels, it depends on the scope and complexity of the model and the level of the water infrastructure system being analysed. At the network level, models may yield outputs related to the overall system performance, such as predicting the failure rate, estimating water loss, and assessing energy consumption; also offering valuable insights into network-wide asset conditions and identifying critical assets require immediate attention (Fan et al. 2023).

At the asset level, models can provide predictions specific to individual components, such as pipelines or pumps, which may include the probability of failure for each asset, estimations of remaining useful life, and optimal maintenance schedules. At the strategic level, models can offer outputs that guide long-term planning and investment decisions, identifying areas requiring infrastructure upgrades, assessing the impact of various investment scenarios, and optimizing capital allocation (Mohammadi and Amador Jimenez 2022).

Models also contribute outputs at the tactical level, facilitating day-to-day decision-making, influencing the prioritization of maintenance activities, optimizing inspection schedules, and identifying short-term risk mitigation measures (Alegre et al. 2013). Thus, models play a crucial role in WIAM by predicting asset performance, optimizing operations, assessing risks, and enabling informed decision-making (Le Gat et al. 2023). The nature of the outputs they provide varies depending on the level of analysis and the specific objectives of the asset management process. Notably, in the context of this study, the models estimating Drinking Water Pipe Failure (DWPF) outputs for WSDN systems offer outcomes that address all identified levels.

Bridging the divide: identifying gaps and addressing the necessity of a review on explanatory variables in drinking water pipe failure models

Numerous review articles have comprehensively summarized and synthesized various facets of water pipe failure modelling. Notably, Rostum (2000) provides an overview of developments up to 2000, including the seminal thesis by Eisenbeis (1994). The works of Pelletier (2000; Mailhot et al. 2000), Mailhot et al. (2003), among others, have made significant contributions to this line of research, which was initially initiated by Shamir and Howard (1979) and further expanded upon by Kettler and Goulter (1985) and Andreou et al. (1987) with a primary focus on developing methodologies for enhancing the maintenance of long-established WSDNs. To provide a comprehensive understanding of the broader aspects of water pipe failure modelling, Table 2 presents notable publications that aim to encapsulate the field without explicitly delving into the definition and analysis of explanatory variables in failure processes.

Table 2 Previous reviews in the field of modelling for predicting DWPFs

Three distinct publications have examined the factors influencing failure prediction models within water infrastructure. One study explicitly investigates water quality factors (Monfared et al. 2021), while another focuses on the statistical dependence of such models on explanatory variables (Gómez-Martínez et al. 2017). Additionally, a comprehensive examination of general-level factors affecting pipe failure in drinking water networks is presented in a separate publication (Barton et al. 2019).

In their research, Monfared et al. (2021) identified nine key factors or explanatory variables related to water quality data, including pH, chlorine (including free residual chlorine, chlorine decay, and chlorine concentration), temperature, turbidity, hardness, colour, water age, alkalinity, and conductivity. The study revealed a notable gap in understanding the precise influence of these variables on failure prediction models, highlighting the need for further investigation and analysis in this area.

Gómez-Martínez et al. (2017) examined thirteen explanatory variables' impact on their models. These variables encompassed physical characteristics such as diameter, year of installation, pipe material, and environmental factors like terrain type, land use, and depth of installation. Additionally, the study included hydraulic-related variables, namely pressures, velocities, and transients, which they referred to as internal variables. The research findings indicated that incorporating various explanatory variables did not yield significant advantages. On the contrary, simplifying the models by reducing the number of variables enhanced their reliability and facilitated interpreting results from a service-oriented perspective, particularly for water utility applications.

Although Barton et al. (2019) did not directly focus on failure prediction models, their research extensively examines the impact of physical, hydraulic, and environmental factors on the likelihood of failures, explicitly about drinking water pipes. The study is particularly notable for its comprehensive analysis of how these factors influence pipes, considering variations in materials and their distinct mechanical and chemical properties, which react differently to these variables. Moreover, the research underscores the significance of obtaining precise and comprehensive operational data and pipeline asset inventories to enhance the development of more accurate predictive models for pipeline failures and performance.

Despite the apparent contradiction between the latter two publications regarding the data required to train a model of this nature effectively, they refer to two different situations. The first study emphasizes that many variables can lead to overfitting, a situation where the model appears to fit well with the current data but fails to validate with future datasets due to the inclusion of noise variables (Belkin et al. 2019). On the other hand, the second publication stresses the importance of having a substantial and precise volume of inventory data for use in the models. These perspectives do not conflict, as a balanced number of explanatory variables should be selected based on the specific conditions of each system under analysis and the chosen model.

The significance of Barton's insightful research (Barton et al. 2022a) is apparent as it showcases the evolution of statistical modelling by categorizing these models into three distinct types: deterministic, probabilistic, and machine learning. This research emphasizes the necessity for additional research to ensure the appropriate consideration of variables and other crucial factors in the accurate selection and execution of these models.

The current literature lacks a comprehensive understanding of how explanatory variables influence water pipe failure modelling within (WIAM). While previous research efforts have identified and analysed explanatory variables to some extent, there is a limited exploration of their impact on DWPF modelling processes. This review is based on the identified knowledge gaps, which include the absence of clear definitions of explanatory variables tailored to local conditions and specific modelling needs in DWPF models. Additionally, researchers need criteria to define these variables, and their influence on model selection and performance remains to be seen. Addressing these gaps is crucial for guiding future research.

To bridge these disparities, this research investigates methodologies from various studies, uncovering potential biases and limitations related to choosing and utilizing explanatory variables according to their model local conditions. By meticulously examining the literature, this research seeks to enhance the understanding of how different researchers attribute importance to explanatory variables and their effects on model performance. By shedding light on these aspects, the review will contribute to improving water infrastructure management and the performance of pipe failure prediction models.

Motivation and aim of this review

There are significant motivations to explore this field beyond the identified gap. Firstly, a transparent interconnection exists between water, energy, and food, with growing demands for these resources over time. Water plays a crucial role in all human processes as a fundamental element, underscoring the necessity to develop sustainable strategies for its utilisation (Carmona-Moreno et al. 2021). Leaks and failures in distribution systems are closely linked to pipe deterioration. Apart from the direct economic costs of repairing pipe failures, information on global Water Utilities' energy consumption often needs to be updated or updated. Additionally, other factors, such as the impact of failures and leaks on water quality, pose challenges in quantification (Chen and Guikema 2020).

As reference data, it is estimated that non-revenue water (NRW) losses accounted for 9.1 × 109 m3 of water volume and 3100 GWh of energy loss in WSDNs in the USA in 2018 (Chini and Stillwell 2018). A comparison with the estimated energy consumption of 5600 GWh in the water distribution systems in the USA for the year 2005 (Mostafavi et al. 2018) indicates an approximate 44% reduction in the energy consumption required for WSDN operations. This reduction demonstrates the evident efforts made by Water Utilities to improve energy efficiency, despite the increase in drinking water consumption due to population growth. Implementing better strategies for hydraulic sectorization, optimizing pumping systems, advancements in pipe failure and leakage management processes, and other exogenous factors like per capita water use reduction have contributed positively to this trend.

The NRW index is a standard criterion for evaluating water distribution system performance, particularly concerning water leakage management. This index represents the difference between the volume of water supplied to the distribution system and the volume billed to consumers (Alegre et al. 2016). When the distribution network undergoes higher maintenance levels and exhibits improved integrity, the rate of annual pipe failures decreases, consequently leading to a reduction in NRW (Güngör-Demirci et al. 2018; Ananda 2019). Roigé et al. (2020) introduced the concept of water and energy losses as critical environmental criteria, highlighting the interconnectedness of service pressure, water leakage, and subsequent energy wastage. Figure 1 provides a graphical representation of the global status of pipe failures in distribution systems, showing a direct proportionality between higher rates of pipe failures and more significant NRW indices across various countries (Almheiri et al. 2021).

Fig. 1
figure 1

Non-revenue water by country (%). Green tones indicate a lower NRW index; red tones indicate a higher NRW index. Data updated to the year 2021 (FP2E/BIPE 2019; OECD 2020; AEAS 2021; Go Associados 2021; Istituto Nazionale di Statistica 2021; Water and Sanitation Program 2021)

The environmental ramifications of pipe failures encompass greenhouse gas emissions resulting from non-optimized energy consumption in various water distribution processes affected by these failures and the water losses associated with such occurrences (Nair et al. 2014). In-depth studies, like those conducted by Herstein et al. (2009), outlined the economic, environmental, and social consequences of pipe production, installation, repair/renovation, and ultimate disposal. Additionally, these studies explore the implications of pumping processes, network pressure management, and hydraulic optimization in WSDN. While the current research may not cover the complete life cycle of the distribution system's diverse components (Herstein and Filion 2011), it facilitates a priori assessment of potential excess emissions attributed to network pipe failures.

Roigé et al. (2020) introduced several essential concepts related to water infrastructure, including the organoleptic perception of water, the risks associated with potential events and interruptions in drinking water service, as well as disruptions in pedestrian and motorized traffic. These parameters have a measurable impact on the prioritization of water pipeline renewal.

Although many authors have recognized water leakage in WSDN to cause significant social impacts (Gupta and Kulat 2018), the social aspects of the effects resulting from pipe failures, leaks, and water losses in WSDN remain relatively understudied, with limited research conducted on this subject beyond the work of Roigé et al. (2020).

Mazumder et al. (2021) evaluated economic, operational, environmental, and social consequences arising from the failure of integrated water and road segments, taking into account factors such as financial aspects (rehabilitation/renovation costs of pipes), operational indicators (service performance, hydraulic efficiency, road closures, and potential asset damage), as well as environmental and social impacts (effects on critical infrastructure, traffic, and population density).

Lee and Kim (2020) provided a summary of studies linking water leakages and other distribution network characteristics to sustainable development practices for WSDNs, encompassing economic, social, and environmental considerations. However, understanding the social implications of water leakages in WSDNs requires further investigation and exploration in the literature.

These studies consistently provide compelling evidence regarding the economic impacts of water pipeline failures. A substantial portion of a Water Utility's assets comprises pipelines, prompting numerous investigations focused on assessing the effects of water pipe failures from various angles: traffic congestion (Cunningham et al. 2021), pipeline characteristics (Mazumder et al. 2021), investment-based leakage reduction measures (Ahopelto and Vahala 2020), rehabilitation or replacement of failed water pipelines (Kleiner et al. 2010; Rahman et al. 2014), pipe replacement periods (Park 2011), and average network pressure (AL-Washali et al. 2020), among others. All the methods employed to estimate the economic impacts are based on functions and models that enable the analysis of potential benefits under various parameters and assumptions related to the technical management of the network.

Pipe failure modelling is a fundamental aspect of asset management models in WSDNs (Ugarelli and Sægrov 2022). It involves classifying pipe sections based on their likelihood of failure, which informs decision-making processes regarding renewal policies, maintenance strategies, and the identification of network sectors requiring further investigation for leak detection (Barton et al. 2021).

As mentioned in the preceding section, within the context of explanatory variables associated with water pipe failure modelling, a comprehensive literature review is required to synthesize previous studies, identify research gaps, and propose new directions for the field. To date, no study has summarized pipe failure prediction models and their relationship with explanatory variables, underscoring the necessity to gain insights into the diverse perceptions of explanatory variables that may impact these models.

Outline of the work

This review aims to comprehensively present the various models employed in water pipe failure prediction, focusing on the explanatory variables utilized in models developed over the past 15 years. The primary novelty of this review lies in its exploration of how researchers attribute significance to different explanatory variables within their failure models. By examining the relationships between variables and the characteristics of the models, potential biases in the selection or utilization of explanatory variables can be identified. Moreover, we propose the inclusion of pertinent variables essential for the accurate execution of failure models in drinking water pipelines.

This paper is structured into five primary sections. The introduction provides an overview of Asset Management, linking it to water infrastructure, and emphasizes its relevance to water utilities. It underscores the significance of modelling water pipe failures to enhance the efficiency of water supply systems' operation and planning. Moreover, the introduction identifies a knowledge gap in these models, highlighting the necessity to improve the design and analysis of their explanatory variables.

Review methodology and research questions” section outlines the review methodology and research questions, encompassing search terms, databases used, and exclusion criteria. “Models for prediction of drinking water pipe failure” section introduces the main categories of DWPF models under study. “Explanatory variables used for modelling drinking water pipe failures” section examines the critical explanatory variables that support the DWPF modelling, specifying the type of explanatory variable used, the employed model type, the model's output, and the time horizon of the model outcomes per publication. “Conclusions and future research directions” section concisely synthesizes the findings, providing valuable contributions to the topic, and outlines potential avenues for future research before concluding the review.

Review methodology and research questions

Review methodology

This review adopts a mixed approach, incorporating the postulates of Kitchenham and Charters (2007), Snyder (2019) and the PRISMA Guidelines (Moher et al. 2009). The review includes conference proceedings, journal articles, government documents, doctoral theses, and dissertations from 2007 to 2023, sourced from five academic databases: JSTOR, EBSCO, ProQuest, Scopus, and Web of Science. Initially, a search with the phrase “Drinking water pipe failure modelling” yielded 2914 results, further refined using specific search strings and assessed against the exclusion and inclusion criteria outlined in Table 3. Following this process, the research narrowed to 103 relevant manuscripts, as depicted in Fig. 2.

Table 3 Inclusion and exclusion criteria for including manuscripts in this review
Fig. 2
figure 2

The review process, PRISMA flow diagram

The definition of failure, as employed by Le Gat (2015), plays a crucial role in delineating the scope of this research. The water infrastructure under consideration is structured as a network of interconnected pipelines and failures, encompassing leakage or breakage, typically manifest clustered, affecting specific network segments.

Research questions

This paper aims to conduct a meticulous literature review on water pipe failure models, primarily focusing on the foundational explanatory variables underpinning their analyses and outcomes. The study's objectives are framed by eight research questions (RQ), as outlined in Table 4. These questions define the study's scope and provide a roadmap for gathering and analysing relevant information, enabling the exploration of challenges and potential advancements in this domain. The Introduction section addresses the first three questions, setting the main trajectory of the review and justifying its necessity while establishing its link to asset management.

Table 4 Research questions

Models for prediction of drinking water pipe failure

A brief overview of modelling for the prediction of drinking water pipe failure (DWPF)

Employing a water pipe failure prediction model enables estimating future break/failure events based on historical observations, determining an appropriate renewal rate, and supporting decision-making processes related to key indicators, renewal scenarios, and selecting pipes for replacement. In recent decades, water pipe failure modelling has become a valuable tool for analysing failure data collected from WSDNs, serving as a standard planning approach to investigate potential causes of pipeline failures. Initially, Water Utilities relied on expert judgement to characterize failure events, considering factors such as pipe age and applied pressure. However, the evolution of this approach has aimed to enhance the information captured regarding network failures and improve analysis through various statistical techniques.

The progress in failure event studies has facilitated the identification of specific scenarios and requirements, leading to more effective proposals for network renovation or reinforcement. By improving the quantity and quality of collected information, new indicators can be generated, and existing ones refined, thus impacting failures. Water Utilities are increasingly focused on predicting pipe failures or deficiencies before they happen, evident in the analysis of failure rates per pipe. This involves considering factors related to existing defects, such as previous failures, leakages, and ageing, as well as potential improvements, to prioritize and select pipes for renewal based on their probability of failure. However, a significant challenge in implementing this approach lies in the availability of comprehensive historical data regarding WSDNs, which includes factors such as pipe material, location, age, and failure history.

The emergence of machine learning techniques has significantly enhanced statistical models focused on studying pipe failure phenomena. This development proves beneficial in addressing the limited availability of historical data faced by Water Utilities when investigating trends and probabilities associated with pipe failures. Using sophisticated algorithms, machine learning techniques can effectively estimate missing data by leveraging the stochastic nature of the missing values within the WSDN information dataset. Nonetheless, applying these techniques demands a high level of reliability in the historical information of the WSDN to ensure it adequately represents the entire data universe.

While this paper exclusively focuses on water pipe failure models for WSDN and their associated explanatory variables, it is crucial to acknowledge the progress made in pipe-related research for other purposes. This includes investigations into the mechanical or rheological properties of piping materials. Although such studies fall outside the scope of this paper, it is worth noting that the models presented herein draw upon the knowledge generated in these areas to enhance the predictive capabilities of failure models within the context of multi-criteria analysis conducted by each analysed model.

Classifications of models applied to the investigation of pipe failures in drinking water systems

In recent decades, considerable interest has been in predicting DWPF as WSDN gradually deteriorate. Building upon the methodologies employed in previous reviews (Dawood et al. 2020b; Karimian et al. 2021), this study updates the list of recent research on DWPF models in Table 5 The table systematically categorizes and provides a concise summary of all the models employed by researchers in the last fifteen years, directly addressing research question (RQ4) (Table 6). The information presented in this table has been extracted from all the sources listed in Table 7.

Table 5 Models applied to analyse and predict pipe failures in drinking water systems, compilation to date
Table 6 Key features compendium of relevant recent research in pipe failure prediction models

The main classifications are derived from the significant differences observed in the approaches commonly used in the analysed research. Statistical-based models seek to establish relationships between variables through mathematical equations (Fahrmeir and Tutz 2001). In contrast, Machine Learning-based models take a different approach, utilizing algorithms to learn from the data and establish these relationships (ICAMLDA 2010). Other proposed classifications deviate from these definitions, exploring alternative paths for failure estimation.

Comments on the limitations of current models

The necessity of utilizing historical pipeline inventories and failure data gives rise to a well-recognized issue of left-truncation in executing statistical-based survival models, owing to the nature of event-related data over time. This phenomenon, along with right-censoring, has been extensively examined by Le Gat (2015) and other authors (Robles-Velasco et al. 2021; Barton et al. 2022b), underscoring the significance of mitigating potential interferences in predictions and time estimations to determine the first failure (Xu and Sinha 2021). As aptly pointed out by Scheidegger et al. (2015), left-truncation and survival selection are prevalent features in the available data concerning urban water services. Nonetheless, a comprehensive analysis of the impact of these features on modelling processes is lacking in most reviewed studies, and their direct consideration remains limited to date.

Machine learning models offer several advantages over statistical models, including their improved response to outliers and capacity to establish meaningful relationships between explanatory variables and pipe failures, thus defining the significance of each variable in the modelling process. Despite the growing popularity of machine learning-based models, it is crucial to acknowledge that they may need help incorporating right-censored information, potentially leading to overestimating pipeline faults beyond their actual occurrences (Snider and McBean 2021).

There are other disadvantages associated with machine-learning-based methods. One of the main drawbacks is that these methods, by nature, lack physical constraints. Unless they are explicitly imposed with specific conditions, they do not inherently consider the limitations of the physical environment. As a result, it becomes necessary to critically assess the results of such models, given the challenge of internally verifying their implementation.

Another intricate issue with such models is their interpretability (Barton et al. 2022a). These models can become effective with interpretability, as human interpretation involves considerations beyond the technical proficiency of the modelling process. Commonly used techniques, like SHAP or LIME, are employed in other fields to explain predictions made by such models; nevertheless, in this field, few studies utilize these methods (Fan et al. 2021). The outcomes of these analyses sometimes align with expected or observed results in the field, and they rely on an in-depth understanding of the variables' components and effects specific to each environment and network.

Using standard metrics such as RMSE and MCC may not inherently reflect the practical value of projections made by a machine-learning-based model concerning service needs. An example of this occurs when a model achieves high standard metrics, but its performance may need to be more optimal when the Lorenz Curve analysis is applied (Le Gat 2015), which significantly illustrates the impact of the models on renewal needs.

Among the advantages and disadvantages outlined by Barton et al. (2022a) and Almheiri et al. (2020a), the necessity for hyperparameter tuning in certain cases renders the implementation of such models highly demanding. Furthermore, it is essential to consider the computational power requirements highlighted by Gupta and Segal (2022), as the choice of model type depends on both the service needs, the scope of the DWPF modelling projects, and the utility of their outputs for stakeholders.

In addition to variable types, time and spatial frameworks, the type of response and level of inference, as detailed by Barton et al. (2022a), the focus of the model application may also influence the models’ implementation conditions. For instance, a model aimed at long-term planning would include distinct modelling capabilities, with survival analysis models being commonly selected in such cases.

The nature of data related to pipeline failure events is inherently unbalanced, as in most cases, only 0.1% of the data universe contains one or more failures (Barton et al. 2022b). This significant imbalance underscores the requirement for comprehensive records of failures over an extended period to ensure that the limited data can offer sufficient representativeness for unbiased analysis. An insufficient number of periods considered may lead to underfitting in machine-learning-based models, where the model needs more information due to either high bias or excessive variance. This limitation is also evident when there is an inadequate amount of data for each feature in the analysis, as it is essential to have sufficient training instances to adjust the models effectively.

A prospective area for future research entails addressing the limitations related to data acquisition within Water Utilities, aiming to ascertain the appropriate proportions and data volumes required to implement a failure model effectively. Investigating this direction would encompass various factors, such as the network's characteristics, the availability of inventories or changes within the system, and the users' specific requirements.

Data imbalance constitutes one of the most significant challenges faced by various service operators, as collecting this information was only integrated into Water Utilities' procedures relatively recently. Addressing the issue of data imbalance can be achieved through the utilization of synthetic samples (Robles-Velasco et al. 2023). This approach proves beneficial when faced with limited information, thus mitigating the challenges posed by significant imbalances in the data. Caution must be exercised to avoid generating excessive synthetic samples, which may lead to suboptimal model performance.

The optimization of calibration thresholds plays a crucial role in mitigating the impact of utilizing imbalanced data by striking a balance between sensitivity and specificity (Esposito et al. 2021; Barton et al. 2022b). Additionally, sampling methods, such as stratified sampling, can address this imbalance and ensure adequate representation of materials in both the training and test datasets (Winkler et al. 2018; Barton et al. 2022a).

Despite the assertion that machine-learning-based models do not effectively address the issue of data imbalance, leading to lower accuracy in failure prediction (Robles-Velasco et al. 2021), it is worth exploring the solutions proposed by Kaur et al. (2020). Chen et al. (2019) suggested increasing the spatial resolution of the data, resulting in a loss of accuracy in predicting non-failure events but an improvement in failure predictions.

An essential aspect of model generation lies in the necessity for expert knowledge to define model parameters and conduct data preparation processes. This expertise allows for assessing the models' predictions concerning real-world field conditions. Moreover, expert knowledge facilitates the inclusion of external factors that may be challenging to quantify and integrate into the models, leading to manual parameter adjustments (Barton et al. 2022b) to align the results with the specific context of the Water Utility concerning budget and strategy.

Given the diverse characteristics of each WSDN inventory, direct comparisons of model performance metrics across different networks become problematic because these metrics heavily depend on the quality and size of the inventory on which the models are based (Robles-Velasco et al. 2021). While these inventory differences hinder the ability to compare metrics directly, it is possible to compare different network models under specific conditions where data availability limitations can be overcome.

In cases where complete and reliable inventories are available as the base data for the models, it becomes feasible to compare different network models. To achieve this, algorithms purely based on this comprehensive data are utilized. Additionally, in the case of machine-learning-based models, comparisons are possible when the same algorithm, configured with the same parameters and hyperparameters, is applied. Through such rigorous comparative approaches, the metrics of these models can be effectively compared.

Expert criteria are paramount in model development, particularly in aiding the selection of the appropriate model type to align with the specific needs of the WSDN system. Understanding the stakeholders' requirements necessitates collaboration between system operators and model developers, mainly when they belong to different domains. For machine-learning-based models, selecting hyperparameters presents a complex task, and their variability can significantly impact the model's outcomes. This selection process is computationally expensive, as the effects of hyperparameters become evident only after executing all model processes. Therefore, validation and feedback from system experts are crucial.

While specific tools exist to optimize hyperparameters and enhance model performance automatically, they have yet to deliver optimal results in environments with highly unbalanced data (Czako et al. 2021), which is characteristic of our particular case study.

By defining the weights of each criterion, these experts can tailor the model to the specific requirements of each WSDN (Assad and Bouferguene 2022). The absence of expert judgement may include irrelevant explanatory variables that do not influence pipeline failure processes, leading to suboptimal selection of a reference model or misinterpretation of model outcomes. Such misinterpretations can lead to misguided investments directed towards infrastructure within their amortisation period (Almheiri et al. 2021).

Robles-Velasco et al. (2021) assert that machine-learning-based models with more precise pipeline failure predictions tend to exhibit reduced non-failure forecasts. Barton et al. (2022b) explain how this challenge can be addressed through appropriate threshold optimization—a critical decision-making step unique to each Water Utility, necessitating expert judgement.

An inherent challenge in model generation and optimization is the selection of suitable explanatory variables. One approach to tackle this is by analysing small groups of variables to assess their significance (Robles-Velasco et al. 2021). In the case of medium-sized WSDNs, reducing the number of explanatory variables helps mitigate the risk of model overfitting (Jenkins et al. 2015).

Establishing the number of explanatory variables is a priority process (Fan et al. 2014), and it can be distilled into three constraints: the relevance and redundancy of the variable, the availability of computational resources and time for model execution, and the interpretability of the model outcomes enabling the understanding of complex relationships between the variables involved. The evident necessity for a comprehensive study encompassing the explanatory variables used in the models further reinforces the rationale behind this research.

Table 6 summarizes critical features from selected studies on the subject. It outlines their outputs' scope, objectives, and nature and presents meaningful insights specific to each case study.

Evolving trends in modelling for predicting DWPFs

The subsequent three figures and their corresponding concepts are based on developments derived from the open-source “bibliometrix” R-package (Aria and Cuccurullo 2017). Figure 3 depicts the conceptual structure of the research topic, highlighting the interconnections between the concepts used in the titles and abstracts of the relevant manuscripts. The size of the circle and the text in the figure represent the current importance of each concept, as determined by applying the Fruchterman–Reingold algorithm (Aria and Cuccurullo 2017).

Fig. 3
figure 3

The conceptual structure of the modelling of DWPF for the last 15 years

The figure reveals a lack of uniformity in the use of terms such as “water main,” “pipe,” and “pipelines,” as well as “breaks” and “failure”. The analysis suggests that five key concepts unify the research into distinct clusters, namely the analysis of pipeline failures and infrastructure asset management through machine learning and modelling.

Figure 4 highlights the most frequently used keywords in the studied manuscripts over the past 15 years, illustrating their evolution. Notably, the term “corrosion,” which was once considered a crucial explanatory variable, has seen a decline in usage, while “replacement” and “patterns” have also diminished in popularity. Conversely, terms associated with applying artificial intelligence and machine learning methods are gaining prominence.

Fig. 4
figure 4

The presence of research keywords from 2007 to 2023. The blue line's length represents the years in which the publications mention the keywords. The position of the blue ball indicates the year with the highest frequency of mentions for each keyword, and the size of the ball reflects the frequency of mentions in that particular year

The significance of critical concepts is depicted in Fig. 5 using the Sankey diagram (Aria and Cuccurullo 2017). The diagram illustrates these concepts' evolution, with the rectangles' size representing their relevance during the specified period. Notably, the “Statistical analysis” concept has not seen new developments in the last two years. On the other hand, “Asset management” and “Pipe Failure” concepts have paved the way for “Machine learning” and “Data Mining.” Additionally, the diagram indicates that Bayesian model averaging has yet to experience recent advancements. This chapter confirms the paramount importance of machine learning techniques in the research topic, effectively addressing RQ5.

Fig. 5
figure 5

Evolution of critical concepts researched for the periods 2007–2015 -left-, 2016–2019 -centre- and 2020 to 2022 -right-

The term “Data mining” does not signify a new trend but rather its repeated mention in articles involving machine learning techniques. The interpretation of Fig. 5 reveals that the focus of applied research has shifted from using complex statistical models to harnessing sufficient computational power for implementing and testing various machine learning techniques. This trend is corroborated by the work of Barton et al. (Barton et al. 2022a), which outlines the evolution of DWPF models from statistical applications to the adoption of machine-learning-based models, as illustrated in Fig. 6.

Fig. 6
figure 6

Comparison of trends in DWPF model types between 2007 and 2022. The graph illustrates a substantial increase in the utilization of Machine Learning-based models in the last four years (on the left). Among these machine learning models, those employing supervised learning techniques have experienced a remarkable peak (on the right), surpassing models based on statistics, particularly those with regression analysis

This trend overlooks the significant potential that statistical models hold. As previously noted, statistical-based models directly incorporate external knowledge into the model, a crucial difference from machine learning-based models that require adjustment parameters governing the incorporation of superficial knowledge (Binder 2014). Such models need more extensive development, which might not be replicable in other systems without a similar adjustment phase.

It is essential to acknowledge that the concepts depicted in Fig. 5 do not solely dictate the current and future trends in model evolution. With advancements in modelling techniques, there is a greater capacity to analyse uncertainties and sensitivities associated with the employed variables.

Most studies that address modelling failure for drinking water pipes and uncertainties are predominantly based on Monte Carlo simulations (Beh et al. 2017; Jafari et al. 2021; Fan et al. 2023) and Poisson distributions (Xu et al. 2018). However, these methodologies have also been extensively employed in various related disciplines, such as hydraulic modelling (Braun 2019), optimization in water infrastructure planning (Beh et al. 2017) and flood damage assessments (de Brito et al. 2019; Morita and Tung 2019) among numerous other applications, which illustrates the importance of analysing uncertainties in modelling processes.

These concepts continuously evolve and provide fertile ground for extensive research and exploration. Research predicting or forecasting events based on highly unbalanced data, such as in our case, emphasizes the utmost importance of data quality. Hence, mitigating the influence of uncertainty throughout all stages of the modelling process will not only enhance the reliability of the models but also minimize potential biases inherent in the analysis (Fan et al. 2023).

Explanatory variables used for modelling drinking water pipe failures

Table 7 in Appendix 1 summarizes the explanatory variables extracted from the publications from 2007 to early 2023. This table establishes a link between the model typology proposed in Table 5 and the models implemented in each study, enabling an analysis of the selected model trends. Additionally, it outlines the optimal performance exhibited by each model in the respective studied network. The table also indicates the type of output generated by each study, encompassing failure probabilities, estimated failure times, failure prediction focus, and the creation of performance or failure-related risk indexes and curves.

Table 7 also presents a detailed account of the primary explanatory variables employed in the modelling process. It delineates the type of explanatory variable utilized, the corresponding model output, and the time horizon for the model outcomes in each publication. Through this comprehensive analysis of 103 studies, the list of covariates or explanatory variables reveals the key factors influencing drinking water pipe failures, effectively addressing RQ6. It is important to note that research in this field remains limited, primarily due to the intricacies posed by the unique local conditions of each WSDN. Consequently, there is a demand for in-depth investigations at the WSDN level to discern the impact and interrelationships among variables.

Among the noteworthy studies, Konstantinou and Stoianov (2020) stand out for their meticulous examination of explanatory variables beyond the fundamental analysis typically observed in most publications. Additionally, Robles-Velasco et al. (2020) have provided valuable insights, leading to the proposed organizational framework of variables based on different factors, as outlined in Table 7. These reference studies have significantly contributed to advancing understanding in this domain.

Kerwin et al. (2020) investigated the relationship between variables in sixteen studies, with only one study (Amaitik and Amaitik 2008) employing “time between failures” as an explanatory variable. The insights gathered from these diverse studies unequivocally demonstrate that pipes previously afflicted by failures exhibit a heightened vulnerability to subsequent failures. “Time between failures,” also known as “inter-failure times”, emerges as a central variable with a profound impact on the probability of failure (García-Mora et al. 2015; Le Gat 2015), attributable to its stochastic nature within the process. Thus, the inclusion of this variable becomes indispensable in forthcoming research endeavours.

The implementation of process variables varies across different studies; some opt to segregate variables based on their time dependence (Konstantinou and Stoianov 2020), while others apply all variables in one scenario and prioritize variables based on their linearity in another (Chen et al. 2019), exemplifying the importance of judiciously selecting and incorporating variables in failure modelling. It also underscores the uniqueness of failure models, necessitating bespoke approaches to suit the sensitivities of each WSDN. Such meticulous customization ensures that failure models align precisely with the complexities of the individual WSDN.

Before the advent of studies predicting the failure of drinking water pipes, the age of the pipe held paramount significance when defining renewal requirements. Counterintuitively, the wear of materials due to age does not always emerge as the most critical variable in failure processes; instead, factors such as pipe length and material composition often prove more crucial (Almheiri et al. 2020b). Several sensitivity analyses elucidate the importance of variables and their interactions within specific contexts, prompting the grouping of variables based on their relevant application environment. This categorization provides valuable insights into the criteria influencing the decision to incorporate each variable, addressing RQ7.

Limited research has been conducted to explore the significance or impact of the identified variables on pipe failure processes, which would facilitate the correlation of these variables with risk factors associated with either inherent system functioning or external conditions. In a noteworthy study, Barton et al. (2022b) conducted a compelling analysis by estimating variable influences, confirming that the number of previous failures holds the most dominant influence, followed by pipe length and soil moisture deficit, according to the particular conditions of their study. With a different approach, Fan et al. (2021) found that time interval—related to the number of previous failures—and the ambient temperature were the most critical factors. The incidence of cold days, pipe length, and hot days as significant contributors to the pipe failure process followed this.

Prevalent explanatory variables

A fundamental objective of studies in this domain is to discern the critical explanatory variables that potentially drive variations in the frequency of failures in both the short and long terms. Furthermore, it is essential to determine how these variables can elucidate past observations and enable accurate predictions. The identified variables generally fall into three categories of paramount importance. Firstly, there are those frequently mentioned in most studies, such as the physical characteristics of the pipes, which encompass material composition, age, length, and diameter. These variables benefit from an abundance of data records provided by Water Utilities. Examining the prevalence of physical factors and their ubiquitous inclusion in nearly all studies can be readily justified by the traceability of historical and contemporary data associated with this category. Incorporating pipelines as assets within geographic information systems coupled with hydraulic models for efficient network management renders this information essential to handle for service-providing companies.

Most studies consistently highlight the substantial influence exerted by this group of variables on pipe failures (Robles-Velasco et al. 2021). The extensive impact of these variables on the modelling processes can introduce significant biases in algorithms and hinder the interpretation of results. For instance, a pertinent example is when a model indicates that a more extended pipe section correlates with a higher probability of failure (Jafari et al. 2021), linking this probability to factors such as traffic and operational stress (Mesalie et al. 2021). Numerous investigations affirm that length and the number of previous failures stand out as the most significant variables (Barton et al. 2019, 2022b; Robles-Velasco et al. 2023). It is worth contemplating the normalization of explanatory variables, including length, during the data pre-processing stage and considering whether the normalized data adheres to a normal distribution. If not, alternative statistical distributions should be employed for standardization processes.

The number of documented previous failures serves as an operational parameter derived from the influential work of Le Gat and Eisenbeis (2000) and has demonstrated its significance in various model applications. Lastly, researchers often select additional variables tailored to the specific conditions of the studied networks. For instance, seismic activity is relevant for networks in regions susceptible to seismic events.

Physical conditions

Pipe materials constitute one of the most crucial variables influencing failures in drinking water pipes. Different materials exhibit distinct responses to changes in soil conditions and corrosion effects, necessitating the segregation of pipe material groups in failure models (Kabir et al. 2015b). As Barton et al. (2022b) stated, whether to separate or group various material types in the model inputs have been associated with differences in failure mechanisms based on material type. Nevertheless, it has been proposed that considering the influence of variables on all materials, the seasonal variation in failure rate by material, and the lower failure rate of certain materials, leading to convergence failures in the models, a more realistic approach would be to implement a global model. A global model refers to one that incorporates all materials together in the analysis. Robles-Velasco et al. (2020) indicate that a global pipeline model performs effectively, surpassing many models that segment their implementations based on material.

According to Nugroho et al. (2022) and Dawood et al. (2020b), introducing relatively new materials, such as various densities of polyvinyl chloride, and the limited availability of historical data pose challenges in the implementation of modelling compared to using data from older pipelines with different materials. These assertions stem from a survey that analysed failures in various materials for 1992 and 1993. Regardless, there may need to be more historical data for these years for plastic materials, raising questions about such claims' validity.

The mechanisms leading to pipe failures, which vary based on the material, are well-documented and can be attributed to factors such as poorly manufactured pipes, improper installation, excessive operating conditions, or third-party damage (Mohammadi and Amador Jimenez 2022; Nugroho et al. 2022). These failure mechanisms may also include susceptibility to corrosion, degradation, and structural weaknesses. Nonetheless, it is essential to acknowledge that each network's specific operational, maintenance and environmental conditions introduce variability, preventing the generalization of observed behaviour for each material. Furthermore, the diversity in installation processes worldwide further impedes the standardization of failure causes.

An illustrative example of the variability in failure behaviour based on materials is evident when comparing results from different studies involving diverse networks and materials. For instance, Robles-Velasco et al.’s (2021) study reveals that asbestos-cement and cast-iron materials exhibit inferior performance. Conversely, Martínez García et al.’s (2021) study indicates that in comparison with asbestos-cement pipes, ductile-iron and PVC pipes exhibit higher failure rates. These findings indicate that the failure rates of specific materials can vary significantly depending on the characteristics of the case in consideration.

Several authors indicate that older pipes are expected to have significantly higher failure rates, with studies suggesting that pipe age has the most substantial influence on failure risk (Dawood et al. 2022). Regardless, in relatively young networks, age only emerges as a significant factor (Liu et al. 2022). In other cases, age is an important explanatory variable but not the most decisive in failure processes (Jafari et al. 2021; Assad and Bouferguene 2022).

According to Nugroho (2022), the precise factors influencing the relationship between pipe age and failure rate have yet to be discovered. Variations might be attributed to differences in the quality and strength of the materials used. Some studies differentiate between the age of the pipe sections and the age of the connections and fittings. This difference may be significant at an operational level since the connections may have a different age than the sections and present distinct failure patterns, such as installation errors, compared to the pipe sections, which are more influenced by other failure processes.

Only one identifies length as one of the least critical variables among the studies analysed (Almheiri et al. 2021). Most studies agree that pipe length is either the most crucial or one of the most significant variables. However, this assertion necessitates careful examination, as it is subject to essential conditioning factors. While longer pipelines are more exposed to physical risks such as loads from busy roads, varying soil and geological conditions, and an increased number of accessories due to their length, the probability of failure increases with the length, but it does not necessarily mean that length is the most decisive variable. Therefore, standardizing variables becomes a fundamental process to avoid biases in the modelling process. Nevertheless, it is essential to recognize that length remains a fundamental factor.

The relationship between pipe length and failures is significant, influenced by various physical–chemical phenomena affecting the pipe, such as the Soil Moisture Deficit (Barton et al. 2022b). These effects become more prominent in long sections, including new installations and derivations, considering that longer pipe lengths often entail more accessories, thereby increasing the likelihood of failure (Mohammed et al. 2022). Additionally, terrain variability, factors like roads along the pipe route and pressure variations, can further influence failure probabilities over an extended pipe length.

In certain studies, researchers opted to omit pipe sections shorter than 0.5 m during data pre-processing to streamline the cleaning process of raw data (Robles-Velasco et al. 2023). Other methodologies focus on identifying short pipeline segments for specific repair and replacement, avoiding substituting simultaneously entire kilometres of pipelines (Barton et al. 2022b). However, in the dataset of some studies (Almheiri et al. 2021; Barton et al. 2022b), pipes with a minimum length of 2 m are considered, and the concept of cleaning the raw data for these shorter pipe sections is also utilized.

Environmental conditions

As previously mentioned, soil's corrosiveness is a significant environmental variable influencing water pipe integrity, particularly affecting susceptible pipe materials like steel (Kimutai et al. 2015). Soil movements and shrink–sink phenomena are considered less critical variables (Barton et al. 2022b). Regardless, these phenomena still need to be explored at the entire network level or for a more extensive pipeline dataset. Their long-term impacts under varying conditions may surpass their current understanding from laboratory-based studies. Further investigation is warranted to assess their implications in real-world scenarios comprehensively.

A notable characteristic of each WDN originates from researchers' diverse interpretations of climatic variations (Laucelli et al. 2014). A prime example is the influence of winter conditions in regions experiencing ground freezing. In such cases, it becomes essential to distinguish failure events between warm and cold seasons to avoid introducing biases (Harvey et al. 2014). Nevertheless, a comprehensive examination of the relationship between mean severity, installation, operation, maintenance conditions, and pipe failures is still needed to adequately cater to specific local requirements and address these local complexities. Further research should be directed towards addressing these aspects.

Operational and management conditions

A variable that profoundly influences pipe failure processes yet is often overlooked in most studies is the time interval between failures (Robles-Velasco et al. 2023). Whether the time between failures or since the last failure is short or long, it has various impacts. This variable is associated with discovering failure events shortly after installation or repair (Fan et al. 2021), underscoring the critical importance of Water Utilities' installation, maintenance, and pipe repair processes.

The number of previous failures also exerts a significant impact on model performance. One reason is that Water Utilities can feasibly build a failure database, providing a valuable source of case histories with sufficient traceability to rely on for data. Nevertheless, other operational factors, such as hydraulic network configuration, population density (Fan et al. 2021), pressure changes, transient phenomena, water velocity, and temperature, present complexities in the assessment and calibration phases (Robles-Velasco et al. 2020). Consequently, further research is necessary to evaluate their relationship with pipe failure processes.

A dedicated study focusing on evaluating water physicochemical variables has revealed that residual chlorine and the number of road lanes influence the failure models' outcomes (Almheiri et al. 2021). Another significant factor is the type of soil cover surrounding the pipes, which critically impacts failure probabilities, primarily when the pipes are situated under roads with heavy traffic.

Analysis of the different model outputs

As outlined in Table 8, the outputs of the different models can be grouped according to the specific interests of each group of researchers. Nevertheless, the criteria for these groupings often need to be clarified or reported. The risk estimation approaches (Rof and CRS), widely utilized in flooding studies and climate change resilience, have yet to be utilized in this field, with only 3% of the analysed publications employing these approaches.

Conversely, the Likelihood of Failure (LoF) approach has been more widely adopted, with 35% of the studies utilizing it. As defined by Le Gat (2014), the statistical approach used for estimating failure is based on the developments proposed by Rostum (2000), which suggested the use of the non-homogeneous Poisson process (NHPP). The NHPP approach has gained acceptance within the research community due to its reliable results and solid mathematical foundation. Moreover, derivations of Rostum's processes establish a formula for calculating the probability of a model parameter given observed failure times within a specific time interval.

Failure rate (FR) is a widely used output measure, reaching 27% of use within this compendium. Statistical models have been widely used for estimating failure rates. In addition to these, as noted by Jafari et al. (2021), models such as artificial neural networks (ANNs), genetic algorithms (GAs), and fuzzy inference systems (FISs) have also emerged as popular alternatives for modelling and predicting pipe failure rates. The successful implementation of these models in recent decades stems from their ability to capture explanatory variables' behaviour, including past failures, and the data collection improvements that enable such models' implementation.

A significant category in the model outputs reported in the literature is the number of breaks/failures (NoB), which accounts for 27% of the output variability. Use of this output had decreased over time, after reaching a peak between 2008 and 2015, when many studies employed accessible regression models to implement and interpret with commercial-available statistical software. Nonetheless, this output relies on assumptions of linear or exponential relationships between the future number of failures and some explanatory variables, such as pipe age (Karimian et al. 2021). This assumption has been challenged by several authors (Le Gat 2014), who have shown that pipe age is not a relevant predictor for some pipe materials, such as cement and plastic pipes (Robles-Velasco et al. 2020), while others have acknowledged the role of age in affecting the structural condition of the pipes (Kabir et al. 2015a), but within specific contexts and limitations.

An alternative approach to interpreting the model results involves assessing pipes estimated remaining service life from an economic standpoint. This can be achieved by comparing the equivalent annual cost of installing a new pipe with the annual cost of maintaining the existing pipe in service (Snider and McBean 2021). Additionally, deterioration curves are employed to estimate the service life, considering the concept of life cycle cost (Francisque et al. 2017). However, this output's selection criteria and analysis often need to be clarified or available in some studies (Zangenehmadar and Moselhi 2016). Previously, this output was commonly associated with a linear or exponential vulnerability increase or failure risk (Fahmy and Moselhi 2009). Nevertheless, recent advancements have shifted away from linearization by adopting alternative algorithms (Snider and McBean 2021).

Outputs, such as those based on survival probability and mean time to first/subsequent failure, are also relevant because they account for the effect of left-truncated break records (Xu and Sinha 2021). Some of these studies emphasize the number of previous failures as a key explanatory variable, showing how the selection of the modelling method influences the type of output and the explanatory variables used in the model.

Another approach to analysing the utilized outputs involves examining the relationships between the model types and their associated outcomes, as depicted in Fig. 7 for the most frequently used results. Notably, models based on supervised learning predominantly select the “Likelihood of Failure” as their output, neglecting the use of “Number of Breaks/Failures.” This choice is logical, given that these algorithms' learning process involves computing probabilities for predefined categories (Jo 2021), making them ideally suited for determining likelihoods.

Fig. 7
figure 7

Number of selected outputs concerning the utilized model types: a graph relating the four most frequent output types

On the other hand, models based on regression analysis are primarily associated with the “Number of Breaks/Failures” output, as these statistical techniques aim to predict variable values based on system variables, with the number of failures being of particular interest for projection. Probabilistic models, however, do not yield a “Failure rate” output, as their focus is not on projecting future failures but on identifying failure rates concerning a risk element.

Conclusions and future research directions

This article delves into the essential field of Asset Management concerning Water Infrastructure. Managing assets is vital in addressing water infrastructure challenges, encompassing economic, health, social, and environmental aspects. Effective Infrastructure Asset Management ensures water infrastructure's long-term maintenance and adaptability, mitigating ageing effects and potential obsolescence.

Various models are utilized to predict WSDN pipe failures. These models encompass a range of approaches, including statistical models and machine learning techniques. They use historical data from Water Utilities and related explanatory variables to estimate the likelihood of future pipe failures. Some standard models used for this purpose include survival analysis models, regression models, neural networks, decision trees, and support vector machines, among others. These models play a critical role in enhancing the understanding of pipe failure dynamics and assisting Water Utilities in making informed decisions for effective asset management and infrastructure maintenance.

What criteria influence the decision to incorporate explanatory variables into the processes?

Incorporating explanatory variables in DWPF modelling processes significantly impacts these procedures. Explanatory variables provide the information that models utilize to formulate their predictions. By considering the influence of variables on predictive model accuracy and reliability, the model's robustness increases, resulting in more precise predictions. It is crucial to ensure that these data accurately represent the behaviour of the analysed system's inventories, including failure data, within a suitable timeframe that aligns with the models' algorithmic requirements and the needs defined by the service and stakeholders, subject to the existence of such data inventories.

The appropriate selection of explanatory variables is essential for reducing potential interferences in understanding causal relationships and their impact on the results obtained. The use of machine learning models, which is increasingly prevalent, can complicate interpretability due to their inherent nature, making it challenging to analyse the influence of each variable on the outcome. Consequently, this hinders identifying possible improvement actions in the planning and operation/maintenance stages of Water Utilities. In complex systems like those analysed, some variables may be interrelated. For instance, some models employ the variable “time between failures” instead of “age,” a technically suitable decision as it assigns more significance to data related to failure events in an unbalanced context. Furthermore, if the variables used in the models are overestimated, the quantification of uncertainty will also be affected.

Identifying the most relevant variables is critical for constructing parsimonious models and preventing overfitting, reducing computational requirements and complexities in model implementation. This consideration gains particular significance when considering the resource limitations in the context of current and future optimization efforts. The generalizability of the models for application in various scenarios and systems also relies on the choice of explanatory variables. An appropriate selection of these variables will positively influence the models' capacity to be trained and implemented on diverse datasets within the same system or for a new group with updated time ranges for the Water Utilities' inventories.

What potential future research directions exist considering DWPF modelling and explanatory variables?

Further research is required to gain a deeper understanding of the sensitivity levels of variables in the most representative models and how environmental conditions impact variable selection, enhancing our comprehension of the complex interaction between these variables. Addressing this issue necessitates executing large, controlled trials with test networks to provide more information and enhance future model accuracy.

To the best of our knowledge, no existing research has focused on exploring techniques to ensure the necessary representativeness of a selected test network. Understanding the complexity of selecting the most appropriate network for testing is crucial to improve the study of variables in pipe failure models. Assessing the impact of these variables on large distribution networks incurs high computational costs. However, by ensuring a suitable selection of a test network, these computational expenses can be significantly reduced, streamlining the optimization processes in the models.

A future approach involves identifying how the network's complexity interacts with variables and assessing whether a model suits specific network or sub-network typologies. A study's potential source of bias is the researcher's influence on the network selection, whether a section, such as a hydraulic sector, or an entire distribution network. Despite this limitation, a methodology is yet to be identified and applied to define test networks in a controlled environment to establish the proper conditions for evaluating model performance and variables.

The evidence from this study suggests that further work with a broader range of networks exhibiting more significant variability and physical location could provide deeper insights into identifying patterns or dynamics of variables and network behaviour under defined patterns of operation and maintenance. Additional research is required to examine the long-term efficacy and safety of pipe installation, operation, and maintenance procedures, as they profoundly impact the probability of failure.

An arguable weakness of the current research lies in the inability to directly apply specific models designed for one network to another without modifying the studied variables and parameters affecting each stage of the modelling process, as the methods may not be readily transferable or comparable. A standardized performance metric is crucial to interpret model results confidently. Addressing this issue requires creating and validating a methodology that allows the evaluation of model performance, not only between different models for the same network but also between models used in different networks. This standardization requirement is vital when specifying the measurement conditions of explanatory variables, considering both their spatiotemporal variability and the range of data collection.

What limitations can be identified in the current analysed studies?

Several limitations are identified in the reviewed studies. One area for improvement is the focus on specific datasets or water supply systems, which restricts the generalizability of their findings to broader applications. Comparing results between different systems becomes challenging due to variations in data quality and time availability. Therefore, understanding how the heterogeneity of local conditions, such as material distribution, times between failures, lengths, and other essential variables, impacts the results and model fitting processes requires careful consideration.

The prevailing focus of existing studies has been on applying failure prediction models rather than conducting a comprehensive examination and understanding of the explanatory variables associated with failure processes. Most of the analysed studies do not provide a criterion to identify the most significant variables. The recent utilization of machine learning techniques has further complicated the determination of explanatory variable importance in the modelling process. Consequently, limitations in model applicability often arise due to the necessity for in-depth analysis of the variables integrated within the models.

While the current literature on DWPF primarily addresses the availability of data provided by Water Utility operators, it is crucial to emphasize the active involvement of service operators throughout the entire process, extending well beyond the mere provision of data for pre-processing purposes by researchers. The practical knowledge of operators enables modellers to comprehend the impact of variables on the models and facilitates the functional definition of hyperparameters governing the behaviour of machine learning-based models.

Ensuring adequate systems, services, and support for managing historical data, which forms the basis of the models, should be a top priority for researchers. Relying solely on blind data management at a statistical level, without considering the substantial variability inherent in local distribution networks, raises concerns about the viability of such research strategies. Expertise in modelling processes cannot replace enhancing the quality of underlying data collection processes in system administrations.

The generalizability of model outputs faces limitations that could compromise their validity. Replicating these outputs becomes challenging due to various factors, primarily when models heavily depend on Water Utility data, potentially overlooking valuable historical information. A more comprehensive approach would involve examining a large, carefully selected sample of data directly from the Water Utilities database, thereby revealing how variables interact before comparing the performance of statistical or machine-learning-based failure models.

Limited research addresses the types or groupings of characteristics significantly influencing model outcomes. Many studies solely focus on identifying the best model without thoroughly analysing local conditions and their relationship to the importance of variables or even identifying critical variables. A detailed investigation of how specific variables influence the system, or its components is necessary to determine the best model comprehensively. Therefore, exploring these variables' contextual relevance and contribution is vital to gain a deeper understanding of their impact on the system.

One aspect that has yet to be analysed in the various studies is the impact of introducing a new dataset, such as a new inventory period, into the model. Examining the results of incorporating a new dataset can help prevent overfitting, where models perform well with training data but poorly with unseen data. Additionally, bias in data collection may affect the model's ability to generalize to new scenarios. Considering these limitations can enhance the validity and applicability of future research in water pipe failure modelling and their explanatory variables.

Another crucial area for improvement is the necessity to analyse the entire universe of data a system could provide. Some studies only examine a sample that may or may not be statistically representative; however, no criteria are presented to demonstrate how this sample represents the system. By not analysing the entire system, the obtained results do not consider the system's overall impact on the model, introducing a methodological limitation. This exclusion of external behavioural factors inherent in the overall system dynamics oversimplifies the analysis, affecting the robustness of the model's results.

This study has yet to explore the time constraints of each investigation. The time horizon column of Table 7 in Appendix 1 is particularly valuable as it indicates that most of the analysed studies “predict” failures but do not specify the specific time frames of these predictions. It does not even highlight cases where the models' limitations prevent defining how many years or periods ahead the results are applicable. Making predictions for extended periods requires a sufficiently representative inventory database with a time frame like the one being predicted. Notably, in some cases, the considered base data's time for the study's execution needs to be mentioned. The age and quality of system failure inventories constrain the development of solutions in this area. Regardless, when creating and implementing models that involve predictions, it is essential to consider the time constraints to analyse the model's implementation and the results obtained thoroughly.

What are limitations of this study?

The findings of this study offer valuable insights for researchers and water supply network planners to identify relevant variables for their pipeline failure prediction models. Yet, the scope of this document is limited to the identification of variables. It serves as a guide for the selection process, encompassing data acquisition, management, treatment, and interpretation. The unique conditions of each network and service demand a thorough examination.

The quality of the data used stands as a crucial criterion in variable selection. It has been established that the use of incomplete or low-quality data negatively impacts the predictive capabilities of the model (Fan et al. 2021). Determining data quality depends on the nature of the data and local conditions. For example, a distribution system characterized by significant pressure fluctuations, in combination with piping materials sensitive to such changes and transient events, necessitates representative temporal and spatial pressure data to incorporate pressure behaviour as an explanatory variable for failure processes.

Among the most significant limitations of this study, it was not feasible to analyse the data sample sizes, which could affect the generalizability of the conclusions to a broader population. Conducting such an analysis could suggest sample size parameters based on the type of model and variables considered. From a research perspective, we needed access to the base data of each study, preventing us from verifying the accuracy and reliability of both the base data and the results. Not knowing the characteristics of the networks being analysed further contributed to this limitation.

The examination of variable effects on the modelling processes was limited, as it relied solely on published studies, which impacted the comprehensiveness of the analysis. Additionally, not all variables could be thoroughly analysed due to space constraints, leading to selecting the most representative ones for detailed examination. The conclusions of the studies might have been influenced by external factors or unaccounted variables that were not considered in the analysis. The analysis of explanatory variables may have been more intricate, but the studies may have yet to reflect this complexity in their findings fully.