Skip to main content

Environment-Sensitive Performance Tuning for Distributed Service Orchestration

  • Conference paper
  • First Online:
  • 741 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8969))

Abstract

Modern distributed systems are designed to tolerate unreliable environments, i.e., they aim to provide services even when some failures happen in the underlying hardware or network. However, the impact of unreliable environments can be significant on the performance of the distributed systems, which should be considered when deploying the services. In this paper, we present an approach to optimize performance of the distributed systems under unreliable deployed environments, through searching for optimal configuration parameters. To simulate an unreliable environment, we inject several failures in the environment of a service application, such as a node crash in the cluster, network failures between nodes, resource contention in nodes, etc. Then, we use a search algorithm to find the optimal parameters automatically in the user-selected parameter space, under the unreliable environment we created. We have implemented our approach in a testing-based framework and applied it to several well-known distributed service systems.

F. Ivančić and G. Balakrishnan—Current affiliation: Google, Inc.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Execution time is not obtained from Ganglia, but from Linux time command.

References

  1. Ganglia. http://ganglia.sourceforge.net/

  2. Juju. https://juju.ubuntu.com/

  3. Juju Charms. https://jujucharms.com/

  4. LoadRunner. http://www.hp.com/go/LoadRunner

  5. LXC. http://linuxcontainers.org/

  6. Selenium. http://seleniumhq.org/

  7. Allspaw, J.: Fault injection in production. Commun. ACM 55(10), 48–52 (2012)

    Article  Google Scholar 

  8. Babu, S.: Towards automatic optimization of mapreduce programs. In: SOCC 2010, pp. 137–142 (2010)

    Google Scholar 

  9. Banabic, R., Candea, G.: Fast black-box testing of system recovery code. In: Proceedings of the 7th ACM European Conference on Computer Systems, EuroSys 2012, pp. 281–294 (2012)

    Google Scholar 

  10. Broadwell, P., Sastry, N., Traupman, J.: FIG: A prototype tool for online verification of recovery. In: Workshop on Self-Healing, Adaptive and Self-Managed Systems (2002)

    Google Scholar 

  11. Carbone, M., Rizzo, L.: Dummynet revisited. SIGCOMM Comput. Commun. Rev. 40(2), 12–20 (2010)

    Article  Google Scholar 

  12. Dawson, S., Jahanian, F., Mitton, T.: Experiments on six commercial TCP implementations using a software fault injection tool. Softw. Pract. Exper. 27(12), 1385–1410 (1997)

    Article  Google Scholar 

  13. Gunawi, H., Do, T., Joshi, P., Alvaro, P., Hellerstein, J., Arpaci-Dusseau, A., Arpaci-Dusseau, R., Sen, K., Borthakur, D.: FATE and DESTINI: A framework for cloud recovery testing. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI 2011, pp. 18–18 (2011)

    Google Scholar 

  14. Herodotos, H., Babu, S.: Profiling, What-if analysis, and cost-based optimization of MapReduce programs. In: VLDB 2011, pp. 1111–1122 (2011)

    Google Scholar 

  15. Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: A self-tuning system for big data analytics. In: CIDR 2011, pp. 261–272 (2011)

    Google Scholar 

  16. Hoarau, W., Tixeuil, S., Vauchelles, F.: FAIL-FCI: Versatile fault injection. Future Gener. Comput. Syst. 23(7), 913–919 (2007)

    Article  Google Scholar 

  17. Joshi, P., Ganai, M., Balakrishnan, B., Gupta, A., Papakonstantinou, N.: SETSUDO: Perturbation-based testing framework for scalable distributed systems. In: Proceeding of the Conference on Timely Results in Operating Systems (2013)

    Google Scholar 

  18. Joshi, P., Gunawi, H., Sen, K.: PREFAIL: A programmable tool for multiple-failure injection. In: Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA 2011, pp. 171–188 (2011)

    Google Scholar 

  19. Lubke, R., Lungwitz, R., Schuster, D., Schill, A.: Large-scale tests of distributed systems with integrated emulation of advanced network behavior. WWW/Internet 10(2), 138–151 (2013)

    Google Scholar 

  20. Marinescu, P., Candea, G.: Efficient testing of recovery code using fault injection. ACM Trans. Comput. Syst. 29(4), 11:1–11:38 (2011)

    Article  Google Scholar 

  21. Molyneaux, I.: The Art of Application Performance Testing: Help for Programmers and Quality Assurance. O’Reilly Media (2009)

    Google Scholar 

  22. Tseitlin, A.: The antifragile organization. CACM 56(8), 40–44 (2013)

    Article  Google Scholar 

  23. Ye, T., Kalyanaraman, S.: A recursive random search algorithm for large-scale network parameter configuration. In: SIGMETRICS 2003, pp. 196–205 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Lin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Lin, Y., Ivančić, F., Joshi, P., Balakrishnan, G., Ganai, M., Gupta, A. (2015). Environment-Sensitive Performance Tuning for Distributed Service Orchestration. In: Daydé, M., Marques, O., Nakajima, K. (eds) High Performance Computing for Computational Science -- VECPAR 2014. VECPAR 2014. Lecture Notes in Computer Science(), vol 8969. Springer, Cham. https://doi.org/10.1007/978-3-319-17353-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17353-5_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17352-8

  • Online ISBN: 978-3-319-17353-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics