Skip to main content

On Replica Placement in High-Availability Storage Under Correlated Failure

  • Conference paper
  • First Online:
Combinatorial Optimization and Applications

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9486))

Abstract

A new model describing dependencies among system components as a directed graph is presented and used to solve a novel replica placement problem in data centers. A criterion for optimizing replica placements is formalized and explained. In this work, the optimization goal is to choose placements in which correlated failure events disable as few replicas as possible. A fast optimization algorithm is given for dependency models represented by trees. The main contribution of the paper is an \(O(n + \rho \log \rho )\) dynamic programming algorithm for placing \(\rho \) replicas on a tree with n vertices.

This work was supported, in part, by the National Science Foundation (NSF) under grant number CNS-1115733.

The original version of this chapter was revised: Contents were corrected throughout the chapter. The erratum to this chapter is available at 10.1007/978-3-319-26626-8_60

An erratum to this chapter can be found at http://dx.doi.org/10.1007/978-3-319-26626-8_60

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bakkaloglu, M., Wylie, J.J., Wang, C., et. al: On correlated failures in survivable storage systems. Technical report CMU-CS-02-129, Carnegie Mellon University (2002)

    Google Scholar 

  2. Blume, L., Easley, D., Kleinberg, J., Kleinberg, R., Tardos, E.: Which networks are least susceptible to cascading failures? In: Proceedings of the 52nd Annual Symposium on Foundations of Computer Science (FOCS) (2011)

    Google Scholar 

  3. Chen, M., Chen, W., Liu, L., Zheng, Z.: An analytical framework and its applications for studying brick storage reliability. In: Proceedings of the 26th International Symposium on Reliable Distributed Systems (SRDS) (2007)

    Google Scholar 

  4. Ford, D., Labelle, F., Popovici, F., et al.: Availability in globally distributed storage systems. In: Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI) (2010)

    Google Scholar 

  5. Hu, X.D., Jia, X.H., Du, D.Z., et al.: Placement of data replicas for optimal data availability in ring networks. J. Parallel Distrib. Comput. (JPDC) 61(10), 1412–1424 (2001)

    Article  MATH  Google Scholar 

  6. Pezoa, J.E., Hayat, M.M.: Reliability of heterogeneous distributed computing systems in the presence of correlated failures. IEEE Trans. Parallel Distrib. Comput. 25(4), 1034–1043 (2014)

    Article  Google Scholar 

  7. Kim, J., Dobson, I.: Approximating a loading-dependent cascading failure model with a branching process. IEEE Trans. Reliab. 59(4), 691–699 (2010)

    Article  Google Scholar 

  8. Lian, Q., Chen, W., Zhang, Z.: On the impact of replica placement to the reliability of distributed brick storage systems. In: Proceedings of the International Conference on Distributed Computing Systems (ICDCS) (2005)

    Google Scholar 

  9. Mills, K.A., Chandrasekaran, R., Mittal, N.: Algorithms for replica placement in high-availability storage (2015). arxiv:1503.02654

  10. Nath, S., Yu, H., Gibbons, P.B., Seshan, S.: Subtleties in tolerating correlated failures in wide-area storage systems. In: Proceedings of the 3rd USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2006)

    Google Scholar 

  11. Shekhar, S., Wu, W.: Optimal placement of data replicas in distributed database with majority voting protocol. Theoret. Comput. Sci. 258(1), 555–571 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  12. Weatherspoon, H., Moscovitz, T., Kubiatowicz, J.: Introspective failure analysis: avoiding correlated failures in peer-to-peer systems. In: Proceedings of the 21st Symposium on Reliable Distributed Systems (SRDS) (2002)

    Google Scholar 

  13. Zhang, Z., Wu, W., Shekhar, S.: Optimal placements of replicas in a ring network with majority voting protocol. J. Parallel Distrib. Comput. (JPDC) 69(5), 461–469 (2009)

    Article  Google Scholar 

  14. Zhu, Y., Yan, J., Sun, Y., et al.: Revealing cascading failure vulnerability in power grids using risk-graph. IEEE Trans. Parallel Distrib. Syst. (TPDS) 25(12), 3274–3284 (2014)

    Article  Google Scholar 

Download references

Acknowledgments

We would like to acknowledge insightful comments from S. Venkatesan and Balaji Raghavachari during meetings about results contained in this paper, as well as comments from Conner Davis on a draft version of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Alex Mills .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Mills, K.A., Chandrasekaran, R., Mittal, N. (2015). On Replica Placement in High-Availability Storage Under Correlated Failure. In: Lu, Z., Kim, D., Wu, W., Li, W., Du, DZ. (eds) Combinatorial Optimization and Applications. Lecture Notes in Computer Science(), vol 9486. Springer, Cham. https://doi.org/10.1007/978-3-319-26626-8_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26626-8_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26625-1

  • Online ISBN: 978-3-319-26626-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics