Autonomous Security Mechanisms for High-Performance Computing Systems: Review and Analysis

Hou, Tao; Wang, Tao; Shen, Dakun; Lu, Zhuo; Liu, Yao

doi:10.1007/978-3-030-33432-1_6

Tao Hou⁷,
Tao Wang⁸,
Dakun Shen⁹,
Zhuo Lu⁷ &
…
Yao Liu⁷

1307 Accesses
2 Citations

Abstract

High-performance computing (HPC) has played an increasingly important role in the fields of research, commerce, and national security. Though HPC systems may inherit security issues from general-purpose computers, simply retrofitting traditional security mechanisms for HPC systems is inappropriate or ineffective. In this chapter, we provide an overview of the design and architecture of HPC systems, and analyze the potential threats and vulnerabilities in HPC systems. We also analyze how to use defense mechanisms from the aspects of implementation, methodology, application, and performance for autonomous cyber defense in HPC systems. This chapter provides a comprehensive review of autonomous security mechanisms for HPC security and sheds light on applying security defense mechanisms to HPC systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.00; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Software approaches for resilience of high performance computing systems: a survey

Article 12 December 2022

High Performance Computing in the Cloud: A Survey on Performance and Usability

Nighthawk: Transparent System Introspection from Ring -3

References

Erich Strohmaier, Jack J Dongarra, Hans W Meuer, and Horst D Simon. The marketplace of high-performance computing. Parallel Computing, 25(13):1517–1544, 1999.
Article Google Scholar
Oak Ridge National Laboratory. Introducing Titan. https://www.olcf.ornl.gov/titan/.
J Michalakes, J Dudhia, D Gill, T Henderson, J Klemp, W Skamarock, and W Wang. The weather research and forecast model: software architecture and performance. In Proc. of ECMWF, 2005.
Google Scholar
Lorin Hochstein, Taiga Nakamura, Victor R Basili, Sima Asgari, Marvin V Zelkowitz, Jeffrey K Hollingsworth, Forrest Shull, Jeffrey Carver, Martin Voelp, Nico Zazworka, et al. Experiments to understand HPC time to development. CTWatch Quarterly, 2(4A), 2006.
Google Scholar
Dan Gabriel Cacuci. Handbook of Nuclear Engineering, volume 2. Springer Science & Business Media, 2010.
Google Scholar
Dewayne Adams. Six security risks in high performance computing (HPC). http://patriot-tech.com/six-security-risks-in-high-performance-computing-hpc/.
Curtis M Keliiaa and Jason R Hamlet. National cyber defense high performance computing and analysis: Concepts, planning and roadmap. SANDIA Report, 2010.
Google Scholar
Matt Bishop. What is computer security? IEEE Security & Privacy, 1(1):67–69, 2003.
Article Google Scholar
Alex Malin and Graham Van Heule. Continuous monitoring and cyber security for high performance computing. In Proc. of ACM CLHS, 2013.
Google Scholar
insideHPC. What is high performance computing? http://insidehpc.com/hpc-basic-training/what-is-hpc/.
Oak Ridge National Laboratory. Titan user guide. https://www.olcf.ornl.gov/support/system-user-guides/titan-user-guide/.
Jiuxing Liu, Jiesheng Wu, and Dhabaleswar K Panda. High performance rdma-based mpi implementation over infiniband. International J. of Parallel Programming, 32(3):167–198, 2004.
Google Scholar
David Luebke. CUDA: Scalable parallel programming for high-performance scientific computing. In Proc. of IEEE ISBI, 2008.
Google Scholar
NVIDIA. What is GPU-accelerated computing? http://www.nvidia.com/object/what-is-gpu-computing.html.
Oak Ridge National Laboratory. ORNL debuts Titan supercomputer. https://www.olcf.ornl.gov/wp-content/themes/olcf/titan/Titan_Debuts.pdf.
Garrick Staples. Torque resource manager. In Proc. of ACM/IEEE conference on Supercomputing, 2006.
Google Scholar
Amjad Majid Ali, Don Albert, Par Andersson, Ernest Artiaga, Daniel Auble, Susanne Balle, Anton Blanchard, Hongjia Cao, Daniel Christians, Gilles Civario, et al. Simple linux utility for resource management. Technical report, Lawrence Livermore National Laboratory, 2008.
Google Scholar
Cray. Cray Linux Environment (CLE) software release overview. http://docs.cray.com/books/S-2425-52xx/.
Redhat. Redhat HPC solution. http://www.dell.com/downloads/global/solutions/vslc/redhat_hpc_solution.pdf.
IBM. IBM Spectrum Computing accelerates high-performance and data-intensive workloads. https://www.ibm.com/spectrum-computing.
Stephen Booth and Elson Mourao. Single sided MPI implementations for SUN MPI. In Proc. of IEEE SC, 2000.
Google Scholar
William Gropp, Ewing Lusk, Nathan Doss, and Anthony Skjellum. A high-performance, portable implementation of the MPI message passing interface standard. Parallel computing, 22(6):789–828, 1996.
Article Google Scholar
IBM. IBM Spectrum MPI. https://www.ibm.com/us-en/marketplace/spectrum-mpi.
Intel. Intel MPI Library. https://software.intel.com/en-us/intel-mpi-library.
Edgar Gabriel, Graham E Fagg, George Bosilca, Thara Angskun, Jack J Dongarra, Jeffrey M Squyres, Vishal Sahay, Prabhanjan Kambadur, Brian Barrett, Andrew Lumsdaine, et al. Open mpi: Goals, concept, and design of a next generation mpi implementation. In Proc. of Springer European PVM. Springer, 2004.
Google Scholar
Grant Mackey, Saba Sehrish, John Bent, Julio Lopez, Salman Habib, and Jun Wang. Introducing map-reduce to high end computing. In Proc. of PDSW, 2008.
Google Scholar
Andrzej Bialecki, Michael Cafarella, Doug Cutting, and Owen OMalley. Hadoop: a framework for running applications on large clusters built of commodity hardware. Wiki at http://lucene.apache.org/hadoop , 11, 2005.
Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. Spark: cluster computing with working sets. HotCloud, 10:10–10, 2010.
Google Scholar
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, et al. Storm @ twitter. In Proc. of ACM SIGMOD, 2014.
Google Scholar
Paris Carbone, Stephan Ewen, Seif Haridi, Asterios Katsifodimos, Volker Markl, and Kostas Tzoumas. Apache flink: Stream and batch processing in a single engine. Data Engineering, page 28, 2015.
Google Scholar
Rajiv Ranjan. Streaming big data processing in datacenter clouds. IEEE Cloud Computing, 1(1):78–83, 2014.
Article Google Scholar
T Nakashima, M Oyama, H Hisada, and N Ishii. Analysis of software bug causes and its prevention. Information and Software technology, 41(15):1059–1068, 1999.
Article Google Scholar
Crispan Cowan, Calton Pu, Dave Maier, Jonathan Walpole, Peat Bakke, Steve Beattie, Aaron Grier, Perry Wagle, Qian Zhang, and Heather Hinton. Stackguard: Automatic adaptive detection and prevention of buffer-overflow attacks. In Proc. of USENIX Security Symposium, 1998.
Google Scholar
Yilin Mo and Bruno Sinopoli. Secure control against replay attacks. In Proc. of IEEE Communication, Control, and Computing, 2009.
Google Scholar
Shuo Chen, Jun Xu, Emre Can Sezer, Prachi Gauriar, and Ravishankar K Iyer. Non-control-data attacks are realistic threats. In Proc. of USENIX Security Symposium, 2005.
Google Scholar
Niels Provos, Markus Friedl, and Peter Honeyman. Preventing privilege escalation. In Proc. of USENIX Security Symposium, 2003.
Google Scholar
Florian Kammüller and Christian W Probst. Modeling and verification of insider threats using logical analysis. 2016.
Google Scholar
Peter Mell, Karen Scarfone, and Sasha Romanosky. Common vulnerability scoring system. IEEE Security & Privacy, 4(6), 2006.
Google Scholar
Michael Hayden. The insider threat to us government information systems. Technical report, National Security Agency/Central Security Service Fort George G Meade MD, 1999.
Google Scholar
BBC. Russian nuclear scientists arrested for ‘Bitcoin mining plot’. http://www.bbc.com/news/world-europe-43003740.
Robert C Seacord. Secure Coding in C and C+ +. Pearson Education, 2005.
Google Scholar
David Evans and David Larochelle. Improving security using extensible lightweight static analysis. IEEE software, 19(1):42–51, 2002.
Article Google Scholar
Hossein Safyallah and Kamran Sartipi. Dynamic analysis of software systems using execution pattern mining. In Proc. of IEEE ICPC, 2006.
Google Scholar
Reed Hastings and Bob Joyce. Purify: Fast detection of memory leaks and access errors. In Proc. of USENIX Security Symposium, 1991.
Google Scholar
Nicholas Nethercote and Julian Seward. Valgrind: A program supervision framework. Electronic notes in theoretical computer science, 89(2):44–66, 2003.
Article Google Scholar
Chris Lattner. Llvm and clang: Next generation compiler technology. In The BSD Conference, 2008.
Google Scholar
Chris Lattner and Vikram Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proc. of IEEE CGO, 2004.
Google Scholar
Ted Kremenek. Finding software bugs with the clang static analyzer. Apple Inc, 2008.
Google Scholar
Bettina Krammer, Katrin Bidmon, Matthias S Müller, and Michael M Resch. Marmot: An mpi analysis and checking tool. In Advances in Parallel Computing, volume 13, pages 493–500. Elsevier, 2004.
Google Scholar
Barbara Kreaseck, Michelle Mills Strout, and Paul Hovland. Depth analysis of mpi programs. In Proc. of AMP, 2010.
Google Scholar
Stephen F Siegel. Verifying parallel programs with mpi-spin. In European Parallel Virtual Machine/Message Passing Interface Users Group Meeting, 2007.
Google Scholar
Jeffrey S Vetter and Bronis R De Supinski. Dynamic software testing of mpi applications with umpire. In Proc. of IEEE SC, 2000.
Google Scholar
Alexander Droste, Michael Kuhn, and Thomas Ludwig. Mpi-checker: static analysis for mpi. In Proc. of ACM LLVM in HPC, 2015.
Google Scholar
Anh Vo, Sriram Aananthakrishnan, Ganesh Gopalakrishnan, Bronis R De Supinski, Martin Schulz, and Greg Bronevetsky. A scalable and distributed dynamic formal verifier for mpi programs. In Proc. of IEEE SC, 2010.
Google Scholar
Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558–565, 1978.
Article Google Scholar
Koen Claessen and John Hughes. Quickcheck: a lightweight tool for random testing of Haskell programs. Acm sigplan notices, 46(4):53–64, 2011.
Article Google Scholar
Joachim Protze, Simone Atzeni, Dong H Ahn, Martin Schulz, Ganesh Gopalakrishnan, Matthias S Müller, Ignacio Laguna, Zvonimir Rakamarić, and Greg L Lee. Towards providing low-overhead data race detection for large openmp applications. In Proc. of IEEE LLVM in HPC, pages 40–47, 2014.
Google Scholar
Konstantin Serebryany and Timur Iskhodzhanov. Threadsanitizer: data race detection in practice. In Proc. of ACM WBIA, pages 62–71, 2009.
Google Scholar
Colin Scott, Vjekoslav Brajkovic, George Necula, Arvind Krishnamurthy, and Scott Shenker. Minimizing faulty executions of distributed systems. In Proc. of USENIX NSDI, 2016.
Google Scholar
Oak Ridge National Laboratory. ORNL Hyperion Technology. https://www.ornl.gov/partnerships/ornl-hyperion-technology.
William Yurcik, Gregory A Koenig, Xin Meng, and Joseph Greenseid. Cluster security as a unique problem with emergent properties: Issues and techniques. In Proc. of LCI ICLC, 2004.
Google Scholar
Butler W Lampson. Protection. ACM SIGOPS Operating Systems Review, 8(1):18–24, 1974.
Article Google Scholar
D Elliott Bell and Leonard J LaPadula. Secure computer systems: Mathematical foundations. Technical report, DTIC Document, 1973.
Google Scholar
James Morris, Stephen Smalley, and Greg Kroah-Hartman. Linux security modules: General security support for the linux kernel. In Proc. of USENIX Security Symposium, 2002.
Google Scholar
Z Cliffe Schreuders, Tanya McGill, and Christian Payne. Empowering end users to confine their own applications: the results of a usability study comparing selinux, apparmor, and fbac-lsm. ACM Trans. TISSEC, 14(2):19, 2011.
Article Google Scholar
NSA Peter Loscocco. Integrating flexible support for security policies into the Linux operating system.
Google Scholar
Andrew Blaich, Douglas Thain, and Aaron Striegel. Reflections on the virtues of modularity: a case study in linux security modules. Software: Practice and Experience, 39(15):1235–1251, 2009.
Google Scholar
Toshiharu Harada, Takashi Horie, and Kazuo Tanaka. Task oriented management obviates your onus on linux. In Linux Conference, volume 3, page 23, 2004.
Google Scholar
Imamjafar Borate and RK Chavan. Sandboxing in linux: From smartphone to cloud. International J. of Computer Applications, 148(8), 2016.
Article Google Scholar
Makan Pourzandi, Axelle Apvrille, E Gingras, A Medenou, and David Gordon. Distributed access control for carrier class clusters. In Proc. of PDPTA, 2003.
Google Scholar
IBM. IBM Security Access Manager. https://www.ibm.com/us-en/marketplace/access-management.
Fausto Giunchiglia, Rui Zhang, and Bruno Crispo. Relbac: Relation based access control. In Proc. of IEEE SKG, 2008.
Google Scholar
Damien Gros, Mathieu Blanc, Jérémy Briffaut, and Christian Toinard. Advanced mac in hpc systems: performance improvement. In Proc. of IEEE CCGrid, 2012.
Google Scholar
Sam Sanchez, Amanda Bonnie, Graham Van Heule, Conor Robinson, Adam DeConinck, Kathleen Kelly, Quellyn Snead, and J Brandt. Design and implementation of a scalable hpc monitoring system. In Proc. of IEEE PDPSW, 2016.
Google Scholar
Sean Peisert. Fingerprinting communication and computation on HPC machines. Lawrence Berkeley National Laboratory, 2010.
Google Scholar
Calvin Ko, Manfred Ruschitzka, and Karl Levitt. Execution monitoring of security-critical programs in distributed systems: A specification-based approach. In Proc. of IEEE S&P, 1997.
Google Scholar
S Sandeep. Process tracing using ptrace. Linux Gazette, (81), 2002.
Google Scholar
Petr Hosek and Cristian Cadar. Safe software updates via multi-version execution. In Proc. of IEEE ICSE, 2013.
Google Scholar
Babak Salamat, Todd Jackson, Andreas Gal, and Michael Franz. Orchestra: intrusion detection using parallel execution and monitoring of program variants in user-space. In Proc. of ACM European CCS, 2009.
Google Scholar
Petr Hosek and Cristian Cadar. Varan the unbelievable: An efficient N-version execution framework. ACM SIGARCH, 43(1):339–353, 2015.
Article Google Scholar
PaX Team. Pax address space layout randomization (aslr). 2003.
Google Scholar
Gaurav S Kc, Angelos D Keromytis, and Vassilis Prevelakis. Countering code-injection attacks with instruction-set randomization. In Proc. of ACM CCS, 2003.
Google Scholar
Sandeep Bhatkar and R Sekar. Data space randomization. In Proc. of Springer DIMVA, pages 1–22, 2008.
Google Scholar
Ashish Venkat, Sriskanda Shamasunder, Hovav Shacham, and Dean M Tullsen. HIPStR: Heterogeneous-ISA program state relocation. In Proc. of ACM ASPLOS, 2016.
Google Scholar
Marco Prandini and Marco Ramilli. Return-oriented programming. IEEE Security & Privacy, 10(6):84–87, 2012.
Article Google Scholar
Martín Abadi, Mihai Budiu, Ulfar Erlingsson, and Jay Ligatti. Control-flow integrity. In Proc. of ACM CCS, 2005.
Google Scholar
Nathan Burow, Scott A Carr, Stefan Brunthaler, Mathias Payer, Joseph Nash, Per Larsen, and Michael Franz. Control-flow integrity: Precision, security, and performance. arXiv, 2016.
Google Scholar
Caroline Tice, Tom Roeder, Peter Collingbourne, Stephen Checkoway, Úlfar Erlingsson, Luis Lozano, and Geoff Pike. Enforcing forward-edge control-flow integrity in gcc & llvm. In Proc. of USENIX Security Symposium, 2014.
Google Scholar
Clang community. Clang 5 documentation: Control Flow Integrity. http://clang.llvm.org/docs/ControlFlowIntegrity.html#publications.
Mingwei Zhang and R Sekar. Control flow integrity for cots binaries. In Proc. of USENIX Security Symposium, 2013.
Google Scholar
Aydan R Yumerefendi, Benjamin Mickle, and Landon P Cox. Tightlip: Keeping applications from spilling the beans. In Proc. of USENIX NSDI, 2007.
Google Scholar
Roberto Capizzi, Antonio Longo, VN Venkatakrishnan, and A Prasad Sistla. Preventing information leaks through shadow executions. In Proc. of IEEE ACSAC, 2008.
Google Scholar
Dominique Devriese and Frank Piessens. Noninterference through secure multi-execution. In Proc. of IEEE S&P, 2010.
Google Scholar
Benjamin Cox, David Evans, Adrian Filipi, Jonathan Rowanhill, Wei Hu, Jack Davidson, John Knight, Anh Nguyen-Tuong, and Jason Hiser. N-variant systems: A secretless framework for security through diversity. In Proc. of USENIX Security Symposium, 2006.
Google Scholar
Artem Dinaburg. Bitsquatting: DNS Hijacking without exploitation. Proceedings of BlackHat Security, 2011.
Google Scholar
Andy A Hwang, Ioan A Stefanovici, and Bianca Schroeder. Cosmic rays don’t strike twice: understanding the nature of DRAM errors and the implications for system design. In Proc. of ACM SIGPLAN Notices, 2012.
Google Scholar
Edmund B Nightingale, John R Douceur, and Vince Orgovan. Cycles, cells and platters: an empirical analysis of hardware failures on a million consumer PCs. In Proc. of EuroSys, 2011.
Google Scholar
KernelL, Bug Tracker. Data corruption with Opteron CPUs and Nvidia chipsets.
Google Scholar
Ashish Gupta, Fan Yang, Jason Govig, Adam Kirsch, Kelvin Chan, Kevin Lai, Shuo Wu, Sandeep Govind Dhoot, Abhilash Rajesh Kumar, Ankur Agiwal, et al. Mesa: Geo-replicated, near real-time, scalable data warehousing. Proc. of the VLDB Endowment, 7(12):1259–1270, 2014.
Article Google Scholar
Miguel Castro and Barbara Liskov. Practical Byzantine fault tolerance and proactive recovery. ACM Trans. TOCS, 20(4):398–461, 2002.
Article Google Scholar
Chi Ho, Robbert Van Renesse, Mark Bickford, and Danny Dolev. Nysiad: Practical protocol transformation to tolerate byzantine failures. In Proc. of USENIX NSDI, 2008.
Google Scholar
Michael G Merideth, Arun Iyengar, Thomas Mikalsen, Stefan Tai, Isabelle Rouvellou, and Priya Narasimhan. Thema: Byzantine-fault-tolerant middleware for web-service applications. In Proc. of IEEE SRDS, 2005.
Google Scholar
Diogo Behrens, Marco Serafini, Flavio P. Junqueira, Sergei Arnautov, and Christof Fetzer. Scalable error isolation for distributed systems. In Proc. of USENIX NSDI, 2015.
Google Scholar

Download references

Author information

Authors and Affiliations

University of South Florida, Tampa, FL, USA
Tao Hou, Zhuo Lu & Yao Liu
New Mexico State University, Las Cruces, NM, USA
Tao Wang
Central Michigan University, Mount Pleasant, MI, USA
Dakun Shen

Authors

Tao Hou
View author publications
You can also search for this author in PubMed Google Scholar
Tao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dakun Shen
View author publications
You can also search for this author in PubMed Google Scholar
Zhuo Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yao Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhuo Lu .

Editor information

Editors and Affiliations

Center for Secure Information Systems, George Mason University, Fairfax, VA, USA
Sushil Jajodia
Thayer School of Engineering, Dartmouth College, Hanover, NH, USA
George Cybenko
Department of Computer Science, Dartmouth College, Hanover, NH, USA
V.S. Subrahmanian
MS T310, MITRE Corporation, McLean, VA, USA
Vipin Swarup
Computing and Information Science Division, Army Research Office, Durham, NC, USA
Cliff Wang
Computer Science & Engineering, University of Michigan, Ann Arbor, MI, USA
Michael Wellman

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hou, T., Wang, T., Shen, D., Lu, Z., Liu, Y. (2020). Autonomous Security Mechanisms for High-Performance Computing Systems: Review and Analysis. In: Jajodia, S., Cybenko, G., Subrahmanian, V., Swarup, V., Wang, C., Wellman, M. (eds) Adaptive Autonomous Secure Cyber Systems. Springer, Cham. https://doi.org/10.1007/978-3-030-33432-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-33432-1_6
Published: 05 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33431-4
Online ISBN: 978-3-030-33432-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Autonomous Security Mechanisms for High-Performance Computing Systems: Review and Analysis

Abstract

Access this chapter

Similar content being viewed by others

Software approaches for resilience of high performance computing systems: a survey

High Performance Computing in the Cloud: A Survey on Performance and Usability

Nighthawk: Transparent System Introspection from Ring -3

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Autonomous Security Mechanisms for High-Performance Computing Systems: Review and Analysis

Abstract

Access this chapter

Similar content being viewed by others

Software approaches for resilience of high performance computing systems: a survey

High Performance Computing in the Cloud: A Survey on Performance and Usability

Nighthawk: Transparent System Introspection from Ring -3

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation