Skip to main content

Integrating SDN-Enhanced MPI with Job Scheduler to Support Shared Clusters

  • Conference paper
  • First Online:
Sustained Simulation Performance 2018 and 2019

Abstract

SDN-enhanced MPI is a framework that integrates the network programmability of Software-Defined Networking (SDN) with Message Passing Interface (MPI). The aim of SDN-enhanced MPI is to improve MPI communication performance by dynamically steering the traffic within the interconnect based on the communication pattern of applications. A major limitation in the current implementation of SDN-enhanced MPI is that multiple jobs cannot be executed concurrently. This paper removes this limitation by integrating SDN-enhanced MPI with the job scheduler of the cluster. Specifically, we have developed a plugin for the job scheduler that collects and reports the job information to the interconnect controller. A preliminary evaluation demonstrated that applications could gain up to 2.56× speedup in communication.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.pbspro.org/.

  2. 2.

    https://www.adaptivecomputing.com/products/torque/.

  3. 3.

    http://www.univa.com/products/.

  4. 4.

    https://osrg.github.io/ryu/.

  5. 5.

    https://grpc.io/.

  6. 6.

    https://www.sqlite.org/index.html.

References

  1. Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Frederickson, P., Lasinski, T., Schreiber, R., Simon, H., Venkatakrishnan, V., Weeratunga, S.: The NAS parallel benchmarks. Inter. J. High Perform. Comput. Appl. 5(3), 63–73 (1991). https://doi.org/10.1177/109434209100500306

    Google Scholar 

  2. Dashdavaa, K., Date, S., Yamanaka, H., Kawai, E., Watashiba, Y., Ichikawa, K., Abe, H., Shimojo, S.: Architecture of a high-speed MPI-Bcast leveraging software-defined network. In: European Conference on Parallel Processing. Lecture Notes in Computer Science, vol. 8374, pp. 885–894. Springer, Berlin (2014). https://doi.org/10.1007/978-3-642-54420-0_86

  3. Date, S., Abe, H., Khureltulga, D., Takahashi, K., Kido, Y., Watashiba, Y., U-chupala, P., Ichikawa, K., Yamanaka, H., Kawai, E., Shimojo, S.: SDN-accelerated HPC Infrastructure for scientific research. Inter. J. Inform. Tech. 22(1), pp. 1–30 (2016)

    Google Scholar 

  4. Jamalian, S., Rajaei, H.: Data-Intensive HPC tasks scheduling with SDN to enable HPC-as-a-service. In: 2015 IEEE 8th International Conference on Cloud Computing (CLOUD 2015), pp. 596–603 (2015). https://doi.org/10.1109/CLOUD.2015.85

  5. McKeown, N., Anderson, T., Balakrishnan, H., Parulkar, G., Peterson, L., Rexford, J., Shenker, S., Turner, J.: OpenFlow: enabling innovation in campus networks. Comput. Commun. Rev. 38(2), 69–74 (2008). https://doi.org/10.1145/1355734.1355746

    Article  Google Scholar 

  6. Michelogiannakis, G., Ibrahim, K.Z., Shalf, J., Wilke, J.J., Knight, S., Kenny, J.P.: APHiD: hierarchical task placement to enable a tapered fat tree topology for lower power and cost in HPC networks. In: 17th International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2017), pp. 228–237 (2017). https://doi.org/10.1109/CCGRID.2017.33

  7. MPI Forum: MPI: A Message-Passing Interface Standard (2012). https://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf

  8. Rodriguez, G., Minkenberg, C., Beivide, R., Luijten, R.P., Labarta, J., Valero, M.: Oblivious routing schemes in extended generalized fat tree networks. In: 2009 International Conference on Cluster Computing (CLUSTER 2009), pp. 1–8 (2009). https://doi.org/10.1109/CLUSTR.2009.5289145

  9. Takahashi, K., Khureltulga, D., Watashiba, Y., Kido, Y., Date, S., Shimojo, S.: Performance evaluation of SDN-enhanced MPI allreduce on a cluster system with fat-tree interconnect. In: 2014 International Conference on High Performance Computing & Simulation (HPCS 2014), pp. 784–792 (2014). https://doi.org/10.1109/HPCSim.2014.6903768

  10. Takahashi, K., Khureltulga, D., Munkhdorj, B., Kido, Y., Date, S., Yamanaka, H., Kawai, E., Shimojo, S.: Concept and design of SDN-enhanced MPI framework. In: Third European Workshop on Software Defined Networks (EWSDN 2015), pp. 109–110 (2015). https://doi.org/10.1109/EWSDN.2015.72

  11. Takahashi, K., Date, S., Khureltulga, D., Kido, Y., Shimojo, S.: PFAnalyzer: a toolset for analyzing application-aware dynamic interconnects. In: 2017 International Conference on Cluster Computing (CLUSTER 2017), pp. 789–796 (2017). https://doi.org/10.1109/CLUSTER.2017.18

  12. Takahashi, K., Date, S., Khureltulga, D., Kido, Y., Yamanaka, H., Kawai, E., Shimojo, S.: UnisonFlow: a software-defined coordination mechanism for message-passing communication and computation. IEEE Access 6, 23372–23382 (2018). https://doi.org/10.1109/ACCESS.2018.2829532

    Article  Google Scholar 

  13. Yoo, A.B., Jette, M.A., Grondona, M.: SLURM: simple Linux utility for resource management. In: Job Scheduling Strategies for Parallel Processing, pp. 44–60. Springer, Berlin (2003). https://doi.org/10.1007/10968987_3

Download references

Acknowledgement

This work was supported by JSPS KAKENHI Grant Number JP17K00168 and JP26330145.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Keichi Takahashi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Takahashi, K., Date, S., Watashiba, Y., Kido, Y., Shimojo, S. (2020). Integrating SDN-Enhanced MPI with Job Scheduler to Support Shared Clusters. In: Resch, M., Kovalenko, Y., Bez, W., Focht, E., Kobayashi, H. (eds) Sustained Simulation Performance 2018 and 2019. Springer, Cham. https://doi.org/10.1007/978-3-030-39181-2_13

Download citation

Publish with us

Policies and ethics