Skip to main content

Long-Term Pipeline Failure Prediction Using Nonparametric Survival Analysis

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track (ECML PKDD 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12460))

Abstract

Australian water infrastructure is more than a hundred years old, thus has begun to show its age through water main failures. Our work concerns approximately half a million pipelines across major Australian cities that deliver water to houses and businesses, serving over five million customers. Failures on these buried assets cause damage to properties and water supply disruptions. We applied Machine Learning techniques to find a cost-effective solution to the pipe failure problem in these Australian cities, where on average 1500 of water main failures occur each year. To achieve this objective, we construct a detailed picture and understanding of the behaviour of the water pipe network by developing a Machine Learning model to assess and predict the failure likelihood of water main breaking using historical failure records, descriptors of pipes and other environmental factors. Our results indicate that our system incorporating a nonparametric survival analysis technique called ‘Random Survival Forest’ outperforms several popular algorithms and expert heuristics in long-term prediction. In addition, we construct a statistical inference technique to quantify the uncertainty associated with the long-term predictions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cronin, D.S., Pick, R.J.: Prediction of the failure pressure for complex corrosion defects. Int. J. Press. Vessels Pip. 79(4), 279–287 (2002)

    Article  Google Scholar 

  2. Asnaashari, A., McBean, E., Shahrour, I., Gharabaghi, B.: Prediction of watermain failure frequencies using multiple and poisson regression. Water Sci. Technol.: Water Supply 9(1), 9–19 (2009)

    Google Scholar 

  3. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  4. Cox, D.R.: Analysis of Survival Data. Routledge, Milton Park (2018)

    Book  Google Scholar 

  5. Dietrich, S., et al.: Random survival forest in practice: a method for modelling complex metabolomics data in time to event analysis. Int. J. Epidemiol. 45(5), 1406–1420 (2016)

    Article  Google Scholar 

  6. Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)

    Article  MathSciNet  Google Scholar 

  7. Gould, S., Boulaire, F., Burn, S., Zhao, X.L., Kodikara, J.: Seasonal factors influencing the failure of buried water reticulation pipes. Water Sci. Technol. 63(11), 2692–2699 (2011)

    Article  Google Scholar 

  8. Ishwaran, H., Kogalur, U.B.: Random survival forests for R (2007)

    Google Scholar 

  9. Ishwaran, H., Kogalur, U.B., Blackstone, E.H., Lauer, M.S.: Random survival forests. Ann. Appl. Stat. 2(3), 841–860 (2008). https://doi.org/10.1214/08-AOAS169

  10. Kaplan, E.L., Meier, P.: Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 53(282), 457–481 (1958)

    Article  MathSciNet  Google Scholar 

  11. Klein, J.P., Moeschberger, M.L.: Survival Analysis: Techniques for Censored and Truncated Data. Springer, Heidelberg (2006). https://doi.org/10.1007/b97377

    Book  MATH  Google Scholar 

  12. Kumar, A., et al.: Using machine learning to assess the risk of and prevent water main breaks. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 472–480 (2018)

    Google Scholar 

  13. Li, Z., et al.: Water pipe condition assessment: a hierarchical beta process approach for sparse incident data. Mach. Learn. 95(1), 11–26 (2013). https://doi.org/10.1007/s10994-013-5386-z

    Article  MathSciNet  Google Scholar 

  14. Liang, B., et al.: Pipeline failure data analytics and prediction. In: OzWater, pp. 25–33. Australian Water Association (2018)

    Google Scholar 

  15. Luo, S., Chu, V.W., Zhou, J., Chen, F., Wong, R.K., Huang, W.: A multivariate clustering approach for infrastructure failure predictions. In: BigData Congress, pp. 274–281. IEEE Computer Society (2017)

    Google Scholar 

  16. Meinshausen, N.: Quantile regression forests. J. Mach. Learn. Res. 7(Jun), 983–999 (2006)

    MathSciNet  MATH  Google Scholar 

  17. Miao, F., Cai, Y.P., Zhang, Y.X., Li, Y., Zhang, Y.T.: Risk prediction of one-year mortality in patients with cardiac arrhythmias using random survival forest. Comput. Math. Methods Med. 2015 (2015)

    Google Scholar 

  18. Moisen, G.G., Freeman, E.A., Blackard, J.A., Frescino, T.S., Zimmermann, N.E., Edwards Jr., T.C.: Predicting tree species presence and basal area in Utah: a comparison of stochastic gradient boosting, generalized additive models, and tree-based methods. Ecol. Model. 199(2), 176–187 (2006)

    Article  Google Scholar 

  19. Nasejje, J.B., Mwambi, H.: Application of random survival forests in understanding the determinants of under-five child mortality in Uganda in the presence of covariates that satisfy the proportional and non-proportional hazards assumption. BMC Res. Notes 10(1), 459 (2017)

    Article  Google Scholar 

  20. Rajeev, P., Kodikara, J., Robert, D., Zeman, P., Rajani, B.: Factors contributing to large diameter water pipe failure. Water Asset Manag. Int. 10(3), 9–14 (2014)

    Google Scholar 

  21. Shamir, U., Howard, C.D.: An analytic approach to scheduling pipe replacement. J.-Am. Water Works Assoc. 71(5), 248–258 (1979)

    Article  Google Scholar 

  22. Shi, L., Sun, L., Vidal Calleja, T., Miro, J.V.: Kernel-specific gaussian process for predicting pipe wall thickness maps. In: Australasian Conference on Robotics and Automation. AARA (2015)

    Google Scholar 

  23. Vanrenterghem-Raven, A., Eisenbeis, P., Juran, I., Christodoulou, S.: Statistical modeling of the structural degradation of an urban water distribution system: case study of New York city. In: World Water & Environmental Resources Congress, pp. 1–10 (2003)

    Google Scholar 

  24. Weeraddana, D., Hapuarachchi, H., Kumarapperuma, L., Khoa, N.L.D., Cai, C.: Long-term water pipe condition assessment: a semiparametric model using Gaussian process and survival analysis. In: Lauw, H.W., Wong, R.C.-W., Ntoulas, A., Lim, E.-P., Ng, S.-K., Pan, S.J. (eds.) PAKDD 2020. LNCS (LNAI), vol. 12085, pp. 487–499. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-47436-2_37

    Chapter  Google Scholar 

  25. Weeraddana, D., et al.: Utilizing machine learning to prevent water main breaks by understanding pipeline failure drivers. In: OzWater. Australian Water Association (2019)

    Google Scholar 

  26. Wey, A., Connett, J., Rudser, K.: Combining parametric, semi-parametric, and non-parametric survival models with stacked survival models. Biostatistics 16(3), 537–549 (2015)

    Article  MathSciNet  Google Scholar 

  27. Zhang, B., et al.: Water pipe failure prediction: a machine learning approach enhanced by domain knowledge. In: Zhou, J., Chen, F. (eds.) Human and Machine Learning. HIS, pp. 363–383. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-90403-0_18

    Chapter  Google Scholar 

Download references

Acknowledgement

We sincerely thank Australian water utilities: Sydney Water, UnityWater and Western Water for sharing data, expert domain knowledge and the valuable feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dilusha Weeraddana .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Weeraddana, D., MallawaArachchi, S., Warnakula, T., Li, Z., Wang, Y. (2021). Long-Term Pipeline Failure Prediction Using Nonparametric Survival Analysis. In: Dong, Y., Mladenić, D., Saunders, C. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12460. Springer, Cham. https://doi.org/10.1007/978-3-030-67667-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67667-4_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67666-7

  • Online ISBN: 978-3-030-67667-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics