Network slicing to improve multicasting in HPC clusters

Alsmadi, Izzat; Khreishah, Abdallah; Xu, Dianxiang

doi:10.1007/s10586-017-1561-5

Network slicing to improve multicasting in HPC clusters

Published: 31 January 2018

Volume 21, pages 1493–1506, (2018)
Cite this article

Cluster Computing Aims and scope Submit manuscript

439 Accesses
2 Citations
Explore all metrics

Abstract

In high performance computing (HPC) resources’ extensive experiments are frequently executed. HPC resources (e.g. computing machines and switches) should be able to handle running several experiments in parallel. Typically HPC utilizes parallelization in programs, processing and data. The underlying network is seen as the only non-parallelized HPC component (i.e. no dynamic virtual slicing based on HPC jobs). In this scope we present an approach in this paper to utilize software defined networking (SDN) to parallelize HPC clusters among the different running experiments. We propose to accomplish this through two major components: A passive module (network mapper/remapper) to select for each experiment as soon as it starts the least busy resources in the network, and an SDN-HPC active load balancer to perform more complex and intelligent operations. Active load balancer can logically divide the network based on experiments’ host files. The goal is to reduce traffic to unnecessary hosts or ports. An HPC experiment should multicast, rather than broadcast to only cluster nodes that are used by the experiment. We use virtual tenant network modules in Opendaylight controller to create VLANs based on HPC experiments. In each HPC host, virtual interfaces are created to isolate traffic from the different experiments. The traffic between the different physical hosts that belong to the same experiment can be distinguished based on the VLAN ID assigned to each experiment. We evaluate the new approach using several HPC public benchmarks. Results show a significant enhancement in experiments’ performance especially when HPC cluster experiences running several heavy load experiments simultaneously. Results show also that this multi-casting approach can significantly reduce casting overhead that is caused by using a single cast for all resources in the HPC cluster. In comparison with InfiniBand networks that offer interconnect services with low latency and high bandwidth, HPC services based on SDN can provide two distinguished objectives that may not be possible with InfiniBand: The first objective is the integration of HPC with Ethernet enterprise networks and hence expanding HPC usage to much wider domains. The second objective is the ability to enable users and their applications to customize HPC services with different QoS requirements that fit the different needs of those applications and optimize the usage of HPC clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Graham, R.L., Woodall, T.S., Squyres J.M.: Open MPI: A flexible high performance MPI. In: Proceedings of the 6th Annual International Conference on Parallel Processing and Applied Mathematics, Poznan, Poland (2005)
Pilla, L.L., Ribeiro, C.P., Coucheney, P., Broquedis, F., Gaujal, B., Navaux, P.O., Méhaut, J.F.: A topology-aware load balancing algorithm for clustered hierarchical multi-core machines. Future Gener. Comput. Syst. 30, 191–201 (2014)
Article Google Scholar
Lowekamp, B., Miller, N., Sutherland, D., Gross, T., Steenkiste, P., Subhlok, J.: A resource query interface for network-aware applications. In: Proceedings of The Seventh International Symposium on High Performance Distributed Computing, 1998 (pp. 189–196). IEEE (1998)
Mathis, M., Mahdavi, J.: Diagnosing internet congestion with a transport layer performance tool. In: Proceedings of INET, vol. 96 (1996)
Jacobson, V.: Pathchar: a tool to infer characteristics of Internet paths (1997)
Carter, R.L., Crovella, M.E.: Dynamic Server Selection Using Bandwidth Probing in Wide-Area Networks. Boston University Computer Science Department, Boston (1996)
Google Scholar
Vasile, M.A., Pop, F., Tutueanu, R.I., Cristea, V, Koodziej J.: Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing. Future Gener. Comput. Syst. (2014). https://doi.org/10.1016/j.future.2014.11.019
Article Google Scholar
Sfrent, A., Pop, F.: Asymptotic scheduling for many task computing in big data platforms. Inf. Sci. 319, 7191 (2015)
Article MathSciNet Google Scholar
Sirbu, A., Pop, C., Serbanescu, C., Pop, F.: Predicting provisioning and booting times in a Metal-as-a-service system. Future Gener. Comput. Syst. 72, 180–192 (2017)
Article Google Scholar
Bessis, N., Sotiriadis, S., Pop, F., Cristea, V.: Using a novel message-exchanging optimization (MEO) model to reduce energy consumption in distributed systems. Simul. Model. Pract. Theory 39, 104–120 (2013)
Article Google Scholar
Watashiba, Y., Date, S., Abe, H., Kido, Y., Ichikawa, K., Yamanaka, H., Takemura, H.: Performance characteristics of an SDN-enhanced job management system for cluster systems with fat-tree interconnect. In: 2014 IEEE 6th International Conference on Cloud Computing Technology and Science (CloudCom) (pp. 781–786). IEEE (2014)
Dashdavaa, K., Date, S., Yamanaka, H., Kawai, E., Watashiba, Y., Ichikawa, K., Shimojo, S.: Architecture of a high-speed mpi_bcast leveraging software-defined network. In: European Conference on Parallel Processing (pp. 885–894). Springer, Berlin (2013)
Chapter Google Scholar
Alsmadi, I., Khamaiseh, S., Xu D.: Network parallelization in HPC clusters. In: The 2016 International Conference on Computational Science and Computational Intelligence (CSCI’16), Symposium of Parallel and Distributed Computing and Computational Science (CSCI-ISPD), December 15–17, Las Vegas (2016)
Huang, L.H., Hung, H.J., Lin, C.C., Yang, D.N.: Scalable and bandwidth-efficient multicast for software-defined networks. In: 2014 IEEE Global Communications Conference (GLOBECOM) (pp. 1890–1896). IEEE (2014)
Takahashi, K., Khureltulga, D., Watashiba, Y., Kido, Y., Date, S., Shimojo, S.: Performance evaluation of SDN-enhanced MPI allreduce on a cluster system with fat-tree interconnect. In: 2014 International Conference on High Performance Computing & Simulation (HPCS) (pp. 784–792). IEEE (2014)
Arap, O., Brown, G., Himebaugh, B., Swany, M.: Software defined multicasting for MPI collective operation offloading with the NetFPGA. In: European Conference on Parallel Processing (pp. 632–643). Springer, Heidelberg (2014)
Google Scholar
Goglin, B., Hursey, J., Squyres, J.M.: Netloc: towards a comprehensive view of the HPC system topology. In: 2014 43rd International Conference on Parallel Processing Workshops (ICCPW) (pp. 216–225). IEEE (2014)
Zhang, X., Jiang, S.: InterferenceRemoval: removing interference of disk access for MPI programs through data replication. In: Proceedings of the 24th ACM International Conference on Supercomputing (pp. 223–232). ACM, New York (2010)
Newhall, T., Libeks, J., Greenwood, R., Knerr, J.: Peermon: a peer-to-peer network monitoring system. In: Proceedings of the 24th International Conference on Large Installation System Administration (pp. 1–12). USENIX Association, Berkeley (2010)
White, S., Verosky, N., Newhall, T.: A CUDA-MPI hybrid bitonic sorting algorithm for GPU clusters. In: 2012 41st International Conference on Parallel Processing Workshops (ICPPW) (pp. 588–589). IEEE (2012)
Bhatelé, A., Kalé, L. V., Kumar, S.: Dynamic topology aware load balancing algorithms for molecular dynamics applications. In: Proceedings of the 23rd International Conference on Supercomputing (pp. 110–116). ACM, New York (2009)
Hoefler, T., Snir, M.: Generic topology mapping strategies for large-scale parallel architectures. In: Proceedings of the International Conference on Supercomputing (pp. 75–84). ACM, New York (2011)
Liao, W.H., Kuai, S.C., Lu, C.H. Dynamic load-balancing mechanism for software-defined networking. In: 2016 International Conference on Networking and Network Applications (NaNA) (pp. 336–341). IEEE (2016)
Wu, Z., Lu, K., Wang, X., Chi, W.: Alleviating network congestion for HPC clusters with fat-tree interconnection leveraging software-defined networking. In: 2016 3rd International Conference on Systems and Informatics (ICSAI) (pp. 808–813). IEEE (2016)
Xavier, M., Neves, M., Rossi, F., Ferreto, T., Lange, T., De Rose, C.: Performance evaluation of container-based virtualization for high performance computing environments. In: 2013 21st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 233–240 (2013)
Prout, A., Arcand, W., Bestor, D., Bergeron, D., Byun, C., Gadepally, V., Hubbell, M., Houle, M., Jones, M., Michaleas, P., Milechin, L., Mullen, J., Rosa, A., Samsi, S., Reuther, A., Kepner, J.: Enhancing HPC security with a user-based firewall. In: IEEE High Performance Extreme Computing (HPEC) Conference, Sep 13–15, Waltham, MA (2016)
Zahid, F., Gran, E.G, Bogdaski, B., Johnsen, BD., Skeie, T.: Efficient network isolation and load balancing in multi-tenant HPC clusters. Future Gener. Comput. Syst. (2016). https://doi.org/10.1016/j.future.2016.04.003
Article Google Scholar
Subramoni, H., Lu, X., Panda Dhabaleswar, K.: A scalable network-based performance analysis tool for MPI on large-scale HPC systems. In: CLUSTER, pp. 354–358 (2017)
Bright Computing, Breaking the Rules: Bright Cluster Manager with Cisco UCS, a Complete HPC Solution. https://www.cisco.com/c/dam/en/us/solutions/collateral/industry-solutions/education/bright-cluster-whitepaper.pdf (2014)
Seger, M.: Collect. http://collectl.sourceforge.net/Support.html. Accessed 27 Jan 2018
Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of collective communication operations in MPICH. Int. J. High Perform. Comput. Appl. 19(1), 49–66 (2005)
Article Google Scholar
Luszczek, P. R., Bailey, D. H., Dongarra, J. J., Kepner, J., Lucas, R. F., Rabenseifner, R., Takahashi, D.: The HPC challenge (HPCC) benchmark suite. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (p. 213) (2006)
Reussner, R., Sanders, P., Prechelt, L., Müller, M.: SKaMPI: a detailed, accurate MPI benchmark. In: European Parallel Virtual Machine/Message Passing Interface Users Group Meeting (pp. 52–59). Springer, Berlin (1998)
Chapter Google Scholar
Qiang, G., Weiqin, T., Kausar, S.: A high-performance DSM leveraging software defined network. In: 2015 44th International Conference on Parallel Processing Workshops (ICPPW) (pp. 105–110). IEEE (2015)

Download references

Author information

Authors and Affiliations

Department of Computing and Cyber Security, Texas A&M, San Antonio One University Way, San Antonio, TX, 78224, USA
Izzat Alsmadi
Electrical and Computer Engineering Department, New Jersey Institute of Technology, Newark, NJ, 07102, USA
Abdallah Khreishah
Department of Computer Science, Boise State University, 1910 University Dr., Boise, ID, 83725, USA
Dianxiang Xu

Authors

Izzat Alsmadi
View author publications
You can also search for this author in PubMed Google Scholar
Abdallah Khreishah
View author publications
You can also search for this author in PubMed Google Scholar
Dianxiang Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Izzat Alsmadi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alsmadi, I., Khreishah, A. & Xu, D. Network slicing to improve multicasting in HPC clusters. Cluster Comput 21, 1493–1506 (2018). https://doi.org/10.1007/s10586-017-1561-5

Download citation

Received: 07 July 2017
Revised: 10 December 2017
Accepted: 15 December 2017
Published: 31 January 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10586-017-1561-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Network slicing to improve multicasting in HPC clusters

Abstract

Access this article

Similar content being viewed by others

Janus: a framework to boost HPC applications in the cloud based on SDN path provisioning

Exploring Functional Slicing in the Design of Distributed SDN Controllers

SDN-based server clusters with dynamic load balancing and performance improvement

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Network slicing to improve multicasting in HPC clusters

Abstract

Access this article

Similar content being viewed by others

Janus: a framework to boost HPC applications in the cloud based on SDN path provisioning

Exploring Functional Slicing in the Design of Distributed SDN Controllers

SDN-based server clusters with dynamic load balancing and performance improvement

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation