Abstract
High-performance computing (HPC) has played an increasingly important role in the fields of research, commerce, and national security. Though HPC systems may inherit security issues from general-purpose computers, simply retrofitting traditional security mechanisms for HPC systems is inappropriate or ineffective. In this chapter, we provide an overview of the design and architecture of HPC systems, and analyze the potential threats and vulnerabilities in HPC systems. We also analyze how to use defense mechanisms from the aspects of implementation, methodology, application, and performance for autonomous cyber defense in HPC systems. This chapter provides a comprehensive review of autonomous security mechanisms for HPC security and sheds light on applying security defense mechanisms to HPC systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Erich Strohmaier, Jack J Dongarra, Hans W Meuer, and Horst D Simon. The marketplace of high-performance computing. Parallel Computing, 25(13):1517–1544, 1999.
Oak Ridge National Laboratory. Introducing Titan. https://www.olcf.ornl.gov/titan/.
J Michalakes, J Dudhia, D Gill, T Henderson, J Klemp, W Skamarock, and W Wang. The weather research and forecast model: software architecture and performance. In Proc. of ECMWF, 2005.
Lorin Hochstein, Taiga Nakamura, Victor R Basili, Sima Asgari, Marvin V Zelkowitz, Jeffrey K Hollingsworth, Forrest Shull, Jeffrey Carver, Martin Voelp, Nico Zazworka, et al. Experiments to understand HPC time to development. CTWatch Quarterly, 2(4A), 2006.
Dan Gabriel Cacuci. Handbook of Nuclear Engineering, volume 2. Springer Science & Business Media, 2010.
Dewayne Adams. Six security risks in high performance computing (HPC). http://patriot-tech.com/six-security-risks-in-high-performance-computing-hpc/.
Curtis M Keliiaa and Jason R Hamlet. National cyber defense high performance computing and analysis: Concepts, planning and roadmap. SANDIA Report, 2010.
Matt Bishop. What is computer security? IEEE Security & Privacy, 1(1):67–69, 2003.
Alex Malin and Graham Van Heule. Continuous monitoring and cyber security for high performance computing. In Proc. of ACM CLHS, 2013.
insideHPC. What is high performance computing? http://insidehpc.com/hpc-basic-training/what-is-hpc/.
Oak Ridge National Laboratory. Titan user guide. https://www.olcf.ornl.gov/support/system-user-guides/titan-user-guide/.
Jiuxing Liu, Jiesheng Wu, and Dhabaleswar K Panda. High performance rdma-based mpi implementation over infiniband. International J. of Parallel Programming, 32(3):167–198, 2004.
David Luebke. CUDA: Scalable parallel programming for high-performance scientific computing. In Proc. of IEEE ISBI, 2008.
NVIDIA. What is GPU-accelerated computing? http://www.nvidia.com/object/what-is-gpu-computing.html.
Oak Ridge National Laboratory. ORNL debuts Titan supercomputer. https://www.olcf.ornl.gov/wp-content/themes/olcf/titan/Titan_Debuts.pdf.
Garrick Staples. Torque resource manager. In Proc. of ACM/IEEE conference on Supercomputing, 2006.
Amjad Majid Ali, Don Albert, Par Andersson, Ernest Artiaga, Daniel Auble, Susanne Balle, Anton Blanchard, Hongjia Cao, Daniel Christians, Gilles Civario, et al. Simple linux utility for resource management. Technical report, Lawrence Livermore National Laboratory, 2008.
Cray. Cray Linux Environment (CLE) software release overview. http://docs.cray.com/books/S-2425-52xx/.
Redhat. Redhat HPC solution. http://www.dell.com/downloads/global/solutions/vslc/redhat_hpc_solution.pdf.
IBM. IBM Spectrum Computing accelerates high-performance and data-intensive workloads. https://www.ibm.com/spectrum-computing.
Stephen Booth and Elson Mourao. Single sided MPI implementations for SUN MPI. In Proc. of IEEE SC, 2000.
William Gropp, Ewing Lusk, Nathan Doss, and Anthony Skjellum. A high-performance, portable implementation of the MPI message passing interface standard. Parallel computing, 22(6):789–828, 1996.
IBM. IBM Spectrum MPI. https://www.ibm.com/us-en/marketplace/spectrum-mpi.
Intel. Intel MPI Library. https://software.intel.com/en-us/intel-mpi-library.
Edgar Gabriel, Graham E Fagg, George Bosilca, Thara Angskun, Jack J Dongarra, Jeffrey M Squyres, Vishal Sahay, Prabhanjan Kambadur, Brian Barrett, Andrew Lumsdaine, et al. Open mpi: Goals, concept, and design of a next generation mpi implementation. In Proc. of Springer European PVM. Springer, 2004.
Grant Mackey, Saba Sehrish, John Bent, Julio Lopez, Salman Habib, and Jun Wang. Introducing map-reduce to high end computing. In Proc. of PDSW, 2008.
Andrzej Bialecki, Michael Cafarella, Doug Cutting, and Owen OMalley. Hadoop: a framework for running applications on large clusters built of commodity hardware. Wiki at http://lucene.apache.org/hadoop , 11, 2005.
Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. Spark: cluster computing with working sets. HotCloud, 10:10–10, 2010.
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, et al. Storm @ twitter. In Proc. of ACM SIGMOD, 2014.
Paris Carbone, Stephan Ewen, Seif Haridi, Asterios Katsifodimos, Volker Markl, and Kostas Tzoumas. Apache flink: Stream and batch processing in a single engine. Data Engineering, page 28, 2015.
Rajiv Ranjan. Streaming big data processing in datacenter clouds. IEEE Cloud Computing, 1(1):78–83, 2014.
T Nakashima, M Oyama, H Hisada, and N Ishii. Analysis of software bug causes and its prevention. Information and Software technology, 41(15):1059–1068, 1999.
Crispan Cowan, Calton Pu, Dave Maier, Jonathan Walpole, Peat Bakke, Steve Beattie, Aaron Grier, Perry Wagle, Qian Zhang, and Heather Hinton. Stackguard: Automatic adaptive detection and prevention of buffer-overflow attacks. In Proc. of USENIX Security Symposium, 1998.
Yilin Mo and Bruno Sinopoli. Secure control against replay attacks. In Proc. of IEEE Communication, Control, and Computing, 2009.
Shuo Chen, Jun Xu, Emre Can Sezer, Prachi Gauriar, and Ravishankar K Iyer. Non-control-data attacks are realistic threats. In Proc. of USENIX Security Symposium, 2005.
Niels Provos, Markus Friedl, and Peter Honeyman. Preventing privilege escalation. In Proc. of USENIX Security Symposium, 2003.
Florian Kammüller and Christian W Probst. Modeling and verification of insider threats using logical analysis. 2016.
Peter Mell, Karen Scarfone, and Sasha Romanosky. Common vulnerability scoring system. IEEE Security & Privacy, 4(6), 2006.
Michael Hayden. The insider threat to us government information systems. Technical report, National Security Agency/Central Security Service Fort George G Meade MD, 1999.
BBC. Russian nuclear scientists arrested for ‘Bitcoin mining plot’. http://www.bbc.com/news/world-europe-43003740.
Robert C Seacord. Secure Coding in C and C+ +. Pearson Education, 2005.
David Evans and David Larochelle. Improving security using extensible lightweight static analysis. IEEE software, 19(1):42–51, 2002.
Hossein Safyallah and Kamran Sartipi. Dynamic analysis of software systems using execution pattern mining. In Proc. of IEEE ICPC, 2006.
Reed Hastings and Bob Joyce. Purify: Fast detection of memory leaks and access errors. In Proc. of USENIX Security Symposium, 1991.
Nicholas Nethercote and Julian Seward. Valgrind: A program supervision framework. Electronic notes in theoretical computer science, 89(2):44–66, 2003.
Chris Lattner. Llvm and clang: Next generation compiler technology. In The BSD Conference, 2008.
Chris Lattner and Vikram Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proc. of IEEE CGO, 2004.
Ted Kremenek. Finding software bugs with the clang static analyzer. Apple Inc, 2008.
Bettina Krammer, Katrin Bidmon, Matthias S Müller, and Michael M Resch. Marmot: An mpi analysis and checking tool. In Advances in Parallel Computing, volume 13, pages 493–500. Elsevier, 2004.
Barbara Kreaseck, Michelle Mills Strout, and Paul Hovland. Depth analysis of mpi programs. In Proc. of AMP, 2010.
Stephen F Siegel. Verifying parallel programs with mpi-spin. In European Parallel Virtual Machine/Message Passing Interface Users Group Meeting, 2007.
Jeffrey S Vetter and Bronis R De Supinski. Dynamic software testing of mpi applications with umpire. In Proc. of IEEE SC, 2000.
Alexander Droste, Michael Kuhn, and Thomas Ludwig. Mpi-checker: static analysis for mpi. In Proc. of ACM LLVM in HPC, 2015.
Anh Vo, Sriram Aananthakrishnan, Ganesh Gopalakrishnan, Bronis R De Supinski, Martin Schulz, and Greg Bronevetsky. A scalable and distributed dynamic formal verifier for mpi programs. In Proc. of IEEE SC, 2010.
Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558–565, 1978.
Koen Claessen and John Hughes. Quickcheck: a lightweight tool for random testing of Haskell programs. Acm sigplan notices, 46(4):53–64, 2011.
Joachim Protze, Simone Atzeni, Dong H Ahn, Martin Schulz, Ganesh Gopalakrishnan, Matthias S Müller, Ignacio Laguna, Zvonimir Rakamarić, and Greg L Lee. Towards providing low-overhead data race detection for large openmp applications. In Proc. of IEEE LLVM in HPC, pages 40–47, 2014.
Konstantin Serebryany and Timur Iskhodzhanov. Threadsanitizer: data race detection in practice. In Proc. of ACM WBIA, pages 62–71, 2009.
Colin Scott, Vjekoslav Brajkovic, George Necula, Arvind Krishnamurthy, and Scott Shenker. Minimizing faulty executions of distributed systems. In Proc. of USENIX NSDI, 2016.
Oak Ridge National Laboratory. ORNL Hyperion Technology. https://www.ornl.gov/partnerships/ornl-hyperion-technology.
William Yurcik, Gregory A Koenig, Xin Meng, and Joseph Greenseid. Cluster security as a unique problem with emergent properties: Issues and techniques. In Proc. of LCI ICLC, 2004.
Butler W Lampson. Protection. ACM SIGOPS Operating Systems Review, 8(1):18–24, 1974.
D Elliott Bell and Leonard J LaPadula. Secure computer systems: Mathematical foundations. Technical report, DTIC Document, 1973.
James Morris, Stephen Smalley, and Greg Kroah-Hartman. Linux security modules: General security support for the linux kernel. In Proc. of USENIX Security Symposium, 2002.
Z Cliffe Schreuders, Tanya McGill, and Christian Payne. Empowering end users to confine their own applications: the results of a usability study comparing selinux, apparmor, and fbac-lsm. ACM Trans. TISSEC, 14(2):19, 2011.
NSA Peter Loscocco. Integrating flexible support for security policies into the Linux operating system.
Andrew Blaich, Douglas Thain, and Aaron Striegel. Reflections on the virtues of modularity: a case study in linux security modules. Software: Practice and Experience, 39(15):1235–1251, 2009.
Toshiharu Harada, Takashi Horie, and Kazuo Tanaka. Task oriented management obviates your onus on linux. In Linux Conference, volume 3, page 23, 2004.
Imamjafar Borate and RK Chavan. Sandboxing in linux: From smartphone to cloud. International J. of Computer Applications, 148(8), 2016.
Makan Pourzandi, Axelle Apvrille, E Gingras, A Medenou, and David Gordon. Distributed access control for carrier class clusters. In Proc. of PDPTA, 2003.
IBM. IBM Security Access Manager. https://www.ibm.com/us-en/marketplace/access-management.
Fausto Giunchiglia, Rui Zhang, and Bruno Crispo. Relbac: Relation based access control. In Proc. of IEEE SKG, 2008.
Damien Gros, Mathieu Blanc, Jérémy Briffaut, and Christian Toinard. Advanced mac in hpc systems: performance improvement. In Proc. of IEEE CCGrid, 2012.
Sam Sanchez, Amanda Bonnie, Graham Van Heule, Conor Robinson, Adam DeConinck, Kathleen Kelly, Quellyn Snead, and J Brandt. Design and implementation of a scalable hpc monitoring system. In Proc. of IEEE PDPSW, 2016.
Sean Peisert. Fingerprinting communication and computation on HPC machines. Lawrence Berkeley National Laboratory, 2010.
Calvin Ko, Manfred Ruschitzka, and Karl Levitt. Execution monitoring of security-critical programs in distributed systems: A specification-based approach. In Proc. of IEEE S&P, 1997.
S Sandeep. Process tracing using ptrace. Linux Gazette, (81), 2002.
Petr Hosek and Cristian Cadar. Safe software updates via multi-version execution. In Proc. of IEEE ICSE, 2013.
Babak Salamat, Todd Jackson, Andreas Gal, and Michael Franz. Orchestra: intrusion detection using parallel execution and monitoring of program variants in user-space. In Proc. of ACM European CCS, 2009.
Petr Hosek and Cristian Cadar. Varan the unbelievable: An efficient N-version execution framework. ACM SIGARCH, 43(1):339–353, 2015.
PaX Team. Pax address space layout randomization (aslr). 2003.
Gaurav S Kc, Angelos D Keromytis, and Vassilis Prevelakis. Countering code-injection attacks with instruction-set randomization. In Proc. of ACM CCS, 2003.
Sandeep Bhatkar and R Sekar. Data space randomization. In Proc. of Springer DIMVA, pages 1–22, 2008.
Ashish Venkat, Sriskanda Shamasunder, Hovav Shacham, and Dean M Tullsen. HIPStR: Heterogeneous-ISA program state relocation. In Proc. of ACM ASPLOS, 2016.
Marco Prandini and Marco Ramilli. Return-oriented programming. IEEE Security & Privacy, 10(6):84–87, 2012.
MartÃn Abadi, Mihai Budiu, Ulfar Erlingsson, and Jay Ligatti. Control-flow integrity. In Proc. of ACM CCS, 2005.
Nathan Burow, Scott A Carr, Stefan Brunthaler, Mathias Payer, Joseph Nash, Per Larsen, and Michael Franz. Control-flow integrity: Precision, security, and performance. arXiv, 2016.
Caroline Tice, Tom Roeder, Peter Collingbourne, Stephen Checkoway, Úlfar Erlingsson, Luis Lozano, and Geoff Pike. Enforcing forward-edge control-flow integrity in gcc & llvm. In Proc. of USENIX Security Symposium, 2014.
Clang community. Clang 5 documentation: Control Flow Integrity. http://clang.llvm.org/docs/ControlFlowIntegrity.html#publications.
Mingwei Zhang and R Sekar. Control flow integrity for cots binaries. In Proc. of USENIX Security Symposium, 2013.
Aydan R Yumerefendi, Benjamin Mickle, and Landon P Cox. Tightlip: Keeping applications from spilling the beans. In Proc. of USENIX NSDI, 2007.
Roberto Capizzi, Antonio Longo, VN Venkatakrishnan, and A Prasad Sistla. Preventing information leaks through shadow executions. In Proc. of IEEE ACSAC, 2008.
Dominique Devriese and Frank Piessens. Noninterference through secure multi-execution. In Proc. of IEEE S&P, 2010.
Benjamin Cox, David Evans, Adrian Filipi, Jonathan Rowanhill, Wei Hu, Jack Davidson, John Knight, Anh Nguyen-Tuong, and Jason Hiser. N-variant systems: A secretless framework for security through diversity. In Proc. of USENIX Security Symposium, 2006.
Artem Dinaburg. Bitsquatting: DNS Hijacking without exploitation. Proceedings of BlackHat Security, 2011.
Andy A Hwang, Ioan A Stefanovici, and Bianca Schroeder. Cosmic rays don’t strike twice: understanding the nature of DRAM errors and the implications for system design. In Proc. of ACM SIGPLAN Notices, 2012.
Edmund B Nightingale, John R Douceur, and Vince Orgovan. Cycles, cells and platters: an empirical analysis of hardware failures on a million consumer PCs. In Proc. of EuroSys, 2011.
KernelL, Bug Tracker. Data corruption with Opteron CPUs and Nvidia chipsets.
Ashish Gupta, Fan Yang, Jason Govig, Adam Kirsch, Kelvin Chan, Kevin Lai, Shuo Wu, Sandeep Govind Dhoot, Abhilash Rajesh Kumar, Ankur Agiwal, et al. Mesa: Geo-replicated, near real-time, scalable data warehousing. Proc. of the VLDB Endowment, 7(12):1259–1270, 2014.
Miguel Castro and Barbara Liskov. Practical Byzantine fault tolerance and proactive recovery. ACM Trans. TOCS, 20(4):398–461, 2002.
Chi Ho, Robbert Van Renesse, Mark Bickford, and Danny Dolev. Nysiad: Practical protocol transformation to tolerate byzantine failures. In Proc. of USENIX NSDI, 2008.
Michael G Merideth, Arun Iyengar, Thomas Mikalsen, Stefan Tai, Isabelle Rouvellou, and Priya Narasimhan. Thema: Byzantine-fault-tolerant middleware for web-service applications. In Proc. of IEEE SRDS, 2005.
Diogo Behrens, Marco Serafini, Flavio P. Junqueira, Sergei Arnautov, and Christof Fetzer. Scalable error isolation for distributed systems. In Proc. of USENIX NSDI, 2015.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Hou, T., Wang, T., Shen, D., Lu, Z., Liu, Y. (2020). Autonomous Security Mechanisms for High-Performance Computing Systems: Review and Analysis. In: Jajodia, S., Cybenko, G., Subrahmanian, V., Swarup, V., Wang, C., Wellman, M. (eds) Adaptive Autonomous Secure Cyber Systems. Springer, Cham. https://doi.org/10.1007/978-3-030-33432-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-33432-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33431-4
Online ISBN: 978-3-030-33432-1
eBook Packages: Computer ScienceComputer Science (R0)