Abstract
Australian water infrastructure is more than a hundred years old, thus has begun to show its age through water main failures. Our work concerns approximately half a million pipelines across major Australian cities that deliver water to houses and businesses, serving over five million customers. Failures on these buried assets cause damage to properties and water supply disruptions. We applied Machine Learning techniques to find a cost-effective solution to the pipe failure problem in these Australian cities, where on average 1500 of water main failures occur each year. To achieve this objective, we construct a detailed picture and understanding of the behaviour of the water pipe network by developing a Machine Learning model to assess and predict the failure likelihood of water main breaking using historical failure records, descriptors of pipes and other environmental factors. Our results indicate that our system incorporating a nonparametric survival analysis technique called ‘Random Survival Forest’ outperforms several popular algorithms and expert heuristics in long-term prediction. In addition, we construct a statistical inference technique to quantify the uncertainty associated with the long-term predictions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cronin, D.S., Pick, R.J.: Prediction of the failure pressure for complex corrosion defects. Int. J. Press. Vessels Pip. 79(4), 279–287 (2002)
Asnaashari, A., McBean, E., Shahrour, I., Gharabaghi, B.: Prediction of watermain failure frequencies using multiple and poisson regression. Water Sci. Technol.: Water Supply 9(1), 9–19 (2009)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Cox, D.R.: Analysis of Survival Data. Routledge, Milton Park (2018)
Dietrich, S., et al.: Random survival forest in practice: a method for modelling complex metabolomics data in time to event analysis. Int. J. Epidemiol. 45(5), 1406–1420 (2016)
Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)
Gould, S., Boulaire, F., Burn, S., Zhao, X.L., Kodikara, J.: Seasonal factors influencing the failure of buried water reticulation pipes. Water Sci. Technol. 63(11), 2692–2699 (2011)
Ishwaran, H., Kogalur, U.B.: Random survival forests for R (2007)
Ishwaran, H., Kogalur, U.B., Blackstone, E.H., Lauer, M.S.: Random survival forests. Ann. Appl. Stat. 2(3), 841–860 (2008). https://doi.org/10.1214/08-AOAS169
Kaplan, E.L., Meier, P.: Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 53(282), 457–481 (1958)
Klein, J.P., Moeschberger, M.L.: Survival Analysis: Techniques for Censored and Truncated Data. Springer, Heidelberg (2006). https://doi.org/10.1007/b97377
Kumar, A., et al.: Using machine learning to assess the risk of and prevent water main breaks. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 472–480 (2018)
Li, Z., et al.: Water pipe condition assessment: a hierarchical beta process approach for sparse incident data. Mach. Learn. 95(1), 11–26 (2013). https://doi.org/10.1007/s10994-013-5386-z
Liang, B., et al.: Pipeline failure data analytics and prediction. In: OzWater, pp. 25–33. Australian Water Association (2018)
Luo, S., Chu, V.W., Zhou, J., Chen, F., Wong, R.K., Huang, W.: A multivariate clustering approach for infrastructure failure predictions. In: BigData Congress, pp. 274–281. IEEE Computer Society (2017)
Meinshausen, N.: Quantile regression forests. J. Mach. Learn. Res. 7(Jun), 983–999 (2006)
Miao, F., Cai, Y.P., Zhang, Y.X., Li, Y., Zhang, Y.T.: Risk prediction of one-year mortality in patients with cardiac arrhythmias using random survival forest. Comput. Math. Methods Med. 2015 (2015)
Moisen, G.G., Freeman, E.A., Blackard, J.A., Frescino, T.S., Zimmermann, N.E., Edwards Jr., T.C.: Predicting tree species presence and basal area in Utah: a comparison of stochastic gradient boosting, generalized additive models, and tree-based methods. Ecol. Model. 199(2), 176–187 (2006)
Nasejje, J.B., Mwambi, H.: Application of random survival forests in understanding the determinants of under-five child mortality in Uganda in the presence of covariates that satisfy the proportional and non-proportional hazards assumption. BMC Res. Notes 10(1), 459 (2017)
Rajeev, P., Kodikara, J., Robert, D., Zeman, P., Rajani, B.: Factors contributing to large diameter water pipe failure. Water Asset Manag. Int. 10(3), 9–14 (2014)
Shamir, U., Howard, C.D.: An analytic approach to scheduling pipe replacement. J.-Am. Water Works Assoc. 71(5), 248–258 (1979)
Shi, L., Sun, L., Vidal Calleja, T., Miro, J.V.: Kernel-specific gaussian process for predicting pipe wall thickness maps. In: Australasian Conference on Robotics and Automation. AARA (2015)
Vanrenterghem-Raven, A., Eisenbeis, P., Juran, I., Christodoulou, S.: Statistical modeling of the structural degradation of an urban water distribution system: case study of New York city. In: World Water & Environmental Resources Congress, pp. 1–10 (2003)
Weeraddana, D., Hapuarachchi, H., Kumarapperuma, L., Khoa, N.L.D., Cai, C.: Long-term water pipe condition assessment: a semiparametric model using Gaussian process and survival analysis. In: Lauw, H.W., Wong, R.C.-W., Ntoulas, A., Lim, E.-P., Ng, S.-K., Pan, S.J. (eds.) PAKDD 2020. LNCS (LNAI), vol. 12085, pp. 487–499. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-47436-2_37
Weeraddana, D., et al.: Utilizing machine learning to prevent water main breaks by understanding pipeline failure drivers. In: OzWater. Australian Water Association (2019)
Wey, A., Connett, J., Rudser, K.: Combining parametric, semi-parametric, and non-parametric survival models with stacked survival models. Biostatistics 16(3), 537–549 (2015)
Zhang, B., et al.: Water pipe failure prediction: a machine learning approach enhanced by domain knowledge. In: Zhou, J., Chen, F. (eds.) Human and Machine Learning. HIS, pp. 363–383. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-90403-0_18
Acknowledgement
We sincerely thank Australian water utilities: Sydney Water, UnityWater and Western Water for sharing data, expert domain knowledge and the valuable feedback.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Weeraddana, D., MallawaArachchi, S., Warnakula, T., Li, Z., Wang, Y. (2021). Long-Term Pipeline Failure Prediction Using Nonparametric Survival Analysis. In: Dong, Y., Mladenić, D., Saunders, C. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12460. Springer, Cham. https://doi.org/10.1007/978-3-030-67667-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-67667-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67666-7
Online ISBN: 978-3-030-67667-4
eBook Packages: Computer ScienceComputer Science (R0)