Skip to main content

A framework for resilience management in the cloud

Ein Framework für Resilience Management in der Cloud

Abstract

Cloud environments make resilience more challenging because of the sharing of non-virtualised resources, frequent reconfigurations, and cyber attacks on these flexible and dynamic systems. We present a Cloud Resilience Management Framework (CRMF), which models and then applies an existing resilience strategy in a cloud operating context to diagnose anomalies. The framework uses an end-to-end feedback loop that allows remediation to be integrated with the existing cloud management systems. We demonstrate the applicability of the framework with a use-case for effective cloud resilience management.

Zusammenfassung

Cloud-Umgebungen stellen wegen der gemeinsamen Nutzung von nicht-virtualisierten Ressourcen, häufiger Rekonfigurationen und Cyber-Angriffen auf diese flexiblen und dynamischen Systeme größere Herausforderungen an Ausfallsicherheit. In dieser Arbeit wird ein Cloud Resilience Management Framework (CRMF) präsentiert, das eine bereits existierende Ausfallsicherheitsstrategie im Kontext eines Cloudbetriebs modelliert und dort anwendet, um Anomalien zu erkennen. Das Framework benutzt eine Ende-zu-Ende-Feedbackschleife, die es ermöglicht, Problembehebung in vorhandene Cloud-Managementsysteme zu integrieren. Weiterhin wird die Anwendbarkeit dieses Frameworks durch einen Anwendungsfall mit effizientem Cloud Resilience Management gezeigt.

This is a preview of subscription content, access via your institution.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Listing 1.
Fig. 8.

Notes

  1. In the NIST cloud computing reference architecture [8] the term tenant is used for consumers who use the cloud based services.

  2. Work presented here is carried out within the FP 7 SECCRIT (SEcure Cloud computing for CRitical infrastructure IT) project (FP7-SEC-2012-1), which is a multidisciplinary research project with the mission to analyse and evaluate cloud computing technologies with respect to security risks in sensitive environments, and to develop methodologies, technologies, and best practices for creating a secure, trustworthy, and high assurance cloud computing environment.

  3. European Union Agency for Network and Information Security: http://www.enisa.europa.eu/.

  4. ResumeNet: http://www.comp.lancs.ac.uk/resilience/.

  5. Heat Orchestration Template: http://docs.openstack.org/developer/heat/template_guide/hot_guide.html.

  6. OpenStack: http://www.openstack.org/.

  7. Volatility framework: https://code.google.com/p/volatility/.

  8. libVMI: https://code.google.com/p/vmitools/.

  9. tcpdump/libpcap: http://www.tcpdump.org/.

  10. libpcap API: http://www.tcpdump.org/.

  11. IND2UCE http://www.iese.fraunhofer.de/en/competencies/security/usage_control/philosophy_uc.html.

References

  1. PRECYSE (2014): http://www.precyse.eu/. Accessed: 2014-10-26.

  2. ResumeNet (2014): http://www.resumenet.eu/. Accessed: 2014-10-26.

  3. TClouds (2014): http://www.tclouds-project.eu//. Accessed: 2014-10-26.

  4. SECCRIT Consortium (2013): An architectural framework for critical infrastructure in cloud computing. Technical report.

  5. Ali, A., Schaeffer-Filho, A., Smith, P., Hutchison, D. (2010): Justifying a policy based approach for ddos remediation: a case study. In 11th annual conference on the convergence of telecommunications, networking & broadcasting, PGNet 2010, Liverpool, UK (pp. 21–22).

    Google Scholar 

  6. Angelov, P., Yager, R. (2011): Simplified fuzzy rule-based systems using non-parametric antecedents and relative data density. In IEEE workshop on evolving and adaptive intelligent systems, EAIS (pp. 62–69). New York: IEEE Press.

    Google Scholar 

  7. Beigi, M. S., Calo, S., Verma, D. (2004): Policy transformation techniques in policy-based systems management. In Proceedings of the fifth IEEE international workshop on policies for distributed systems and networks, POLICY 2004 (pp. 13–22). New York: IEEE Press.

    Google Scholar 

  8. Bohn, R. B., Messina, J., Liu, F., Tong, J., Mao, J. (2011): NIST cloud computing reference architecture. In Proceedings of the IEEE world congress on services, SERVICES ’11, Washington, DC, USA (pp. 594–596). Los Alamitos: IEEE Comput. Soc. ISBN 978-0-7695-4461-8. doi:10.1109/SERVICES.2011.105.

    Google Scholar 

  9. Santiago Cáceres, E., Oliviero, F. (2013): Deliverable 1.2: report on requirements and use cases. https://seccrit.eu/upload/D2-1-Report_on_requirements_and_use_cases-v2.0.pdf.

  10. Casassa Mont, M., Baldwin, A., Goh, C. (2000): Power prototype: towards integrated policy-based management. In Network operations and management symposium. NOMS 2000 (pp. 789–802). New York: IEEE/IFIP.

    Google Scholar 

  11. Catteddu, D. (2011): Security and resilience in governmental clouds: making an informed decision. Technical report, European Network and Information Security Agency (ENISA). http://www.enisa.europa.eu/act/rm/emerging-and-future-risk/deliverables/security-and-resilience-in-governmental-clouds.

  12. Cholda, P., Mykkeltveit, A., Helvik, B. E., Wittner, O. J., Jajszczyk, A. (2007): A survey of resilience differentiation frameworks in communication networks. IEEE Commun. Surv. Tutor., 9(4), 32–55.

    Article  Google Scholar 

  13. Cuppens, F., Miege, A. (2003): Administration model for OR-BAC. In On the move to meaningful Internet systems, OTM 2003 workshops (pp. 754–768). Berlin: Springer.

    Chapter  Google Scholar 

  14. Cuppens, F., Cuppens-Boulahia, N., Coma, C. (2006): Motorbac: un outil dadministration et de simulation de politiques de sécurité. In First joint conference security in network architectures (SAR) and security of information systems (SSI) (pp. 6–9).

    Google Scholar 

  15. Gamer, T. (2009): Anomaly-based identification of large-scale attacks. In Global telecommunications conference, GLOBECOM 2009 (pp. 1–6). New York: IEEE Press.

    Google Scholar 

  16. Hegering, H.-G., Abeck, S., Wies, R. (1996): A corporate operation framework for network service management. IEEE Commun. Mag., 34(1), 62–68.

    Article  Google Scholar 

  17. Kaikini, P., Lewis, L., Malik, R., Rustici, E., Scott, W., Sycamore, S., Thebaut, S. (1999): Method and apparatus for defining and enforcing policies for configuration management in communications networks, February 16, 1999. US patent 5,872,928.

  18. Abou El Kalam, A., Baida, R. E., Balbiani, P., Benferhat, S., Cuppens, F., Deswarte, Y., Miege, A., Saurel, C., Trouessin, G. (2003): Organisation based access control. In Proceedings of IEEE 4th international workshop on policies for distributed systems and networks. POLICY 2003 (pp. 120–131). New York: IEEE Press.

    Google Scholar 

  19. Lakhina, A., Crovella, M., Diot, C. (2005): Mining anomalies using traffic feature distributions. In Proceedings of the 2005 conference on applications, technologies, architectures, and protocols for computer communications, SIGCOMM ’05, New York, NY, USA (pp. 217–228). New York: ACM. ISBN 1-59593-009-4. doi:10.1145/1080091.1080118.

    Chapter  Google Scholar 

  20. Lughofer, E., Guardiola, C. (2008): On-line fault detection with data-driven evolving fuzzy models. Control Intell. Syst., 36(4), 307.

    MATH  Google Scholar 

  21. Marnerides, A., Watson, M., Shirazi, N., Mauthe, A., Hutchison, D. (2013): A snapshot of malware analysis over the cloud: network and system characteristics. In Proc. IEEE Globecom 2013 workshop on cloud computing systems, networks, and applications, CCSNA.

    Google Scholar 

  22. Marnerides, A., James, C., Schaeffer-Filho, A., Sait, S. Y., Mauthe, A., Murthy, H. (2011): Multi-level network resilience: traffic analysis, anomaly detection and simulation. ICTACT Journal on Communication Technology, Special Issue on Next Generation Wireless Networks and Applications, 2(2).

  23. Marnerides, A. K., Hutchison, D., Pezaros, D. P. (2010): Autonomic diagnosis of anomalous network traffic. In IEEE international symposium on a world of wireless mobile and multimedia networks, WoWMoM (pp. 1–6). New York: IEEE Press.

    Google Scholar 

  24. Meyer, B., Anstötz, F., Popien, C. (1996): Towards implementing policy-based systems management. Distrib. Syst. Eng., 3(2), 78.

    Article  Google Scholar 

  25. Neal, D. (2011): Amazon web services outages raise serious cloud questions. Technical report, March 2011, http://www.v3.co.uk/v3-uk/news/2035726/amazon-web-services-outages-raise-cloud-questions.

  26. Oblak, S., Škrjanc, I., Blažič, S. (2007): Fault detection for nonlinear systems with uncertain parameters based on the interval fuzzy model. Eng. Appl. Artif. Intell., 20(4), 503–510.

    Article  Google Scholar 

  27. Roos, J., Putter, P., Bekker, C. (1993): Modelling management policy using enriched managed objects. In Proceedings of the IFIP TC6/WG6.6, third international symposium on integrated network management with participation of the IEEE communications society CNOM and with support from the institute for educational services (pp. 207–215). Amsterdam: North-Holland.

    Google Scholar 

  28. Schaeffer-Filho, A., Smith, P., Mauthe, A. (2011): Policy-driven network simulation: a resilience case study. In Proceedings of the ACM symposium on applied computing (pp. 492–497). New York: ACM.

    Google Scholar 

  29. Schaeffer-Filho, A., Mauthe, A., Hutchison, D., Smith, P., Yu, Y., Fry, M. (2013): Preset: a toolset for the evaluation of network resilience strategies. In IFIP/IEEE international symposium on integrated network management, IM 2013 (pp. 202–209). New York: IEEE Press.

    Google Scholar 

  30. Shirazi, N.-u.-h., Simpson, S., Marnerides, A. K., Watson, M., Mauthe, A., Hutchison, D. (2014): Assessing the impact of intra-cloud live migration on anomaly detection. In IEEE 3rd international conference on cloud networking, CloudNet, Oct. 2014 (pp. 52–57). doi:10.1109/CloudNet.2014.6968968.

    Google Scholar 

  31. Sterbenz, J.P.G., Hutchison, D., Çetinkaya, E. K., Jabbar, A., Rohrer, J. P., Schöller, M., Smith, P. (2010): Resilience and survivability in communication networks: strategies, principles, and survey of disciplines. Comput. Netw., 54(8), 1245–1265.

    Article  MATH  Google Scholar 

  32. CSA CCM Leadership Team (2010): Cloud security alliance cloud controls matrix v1.1. Technical report.

  33. Verma, D. C. (2000): Policy-based networking: architecture and algorithms. San Fancisco: New Riders Publishing.

    Google Scholar 

  34. Yu, Y., Fry, M., Schaeffer-Filho, A., Smith, P., Hutchison, D. (2011): An adaptive approach to network resilience: evolving challenge detection and mitigation. In 8th international workshop on the design of reliable communication networks, DRCN (pp. 172–179). New York: IEEE Press.

    Google Scholar 

Download references

Acknowledgements

The research presented in this paper is sponsored by the EU FP7 Project SECCRIT (Secure Cloud Computing for Critical Infrastructure IT), grant agreement no. 312758. The work on “Deployment function” and “IND2UCE” is by SECCRIT Consortium members NEC (NEC Europe Ltd) and IESE (Fraunhofer Institute for Experimental Software Engineering IESE) respectively. We are grateful to Plamen Angelov for providing insightful comments and inputs to the use of the Recursive Density Estimation technique for implementation of the Network Analysis Engine.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Noor-ul-hassan Shirazi.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shirazi, Nuh., Simpson, S., Oechsner, S. et al. A framework for resilience management in the cloud. Elektrotech. Inftech. 132, 122–132 (2015). https://doi.org/10.1007/s00502-015-0290-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00502-015-0290-9

Keywords

  • resilience management
  • cloud infrastructures
  • policy management
  • remediation

Schlüsselwörter

  • Resilience Management
  • Cloud-Infrastrukturen
  • Policy Management
  • Remediation