Abstract
We report our experiences in porting and tuning the Apache Spark data analytics framework on the Cray XC30 (Edison) and XC40 (Cori) systems, installed at NERSC. We find that design decisions made in the development of Spark are based on the assumption that Spark is constrained primarily by network latency, and that disk I/O is comparatively cheap. These assumptions are not valid on Edison or Cori, which feature advanced low-latency networks but have diskless compute nodes. Lustre metadata access latency is a major bottleneck, severely constraining scalability. We characterize this problem with benchmarks run on a system with both Lustre and local disks, and show how to mitigate high metadata access latency by using per-node loopback filesystems for temporary storage. With this technique, we reduce the shuffle time and improve application scalability from O(100) to O(10, 000) cores on Cori. For shuffle-intensive machine learning workloads, we show better performance than clusters with local disks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
National Energy Research Scientific Computing Center. https://www.nersc.gov
spark-perf benchmark. https://github.com/databricks/spark-perf
Armbrust, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J., Ghodsi, A., Zaharia, M.: Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 2015, pp. 1383–1394. ACM (2015). http://doi.acm.org/10.1145/2723372.2742797
Chaimov, N., Malony, A., Canon, S., Iancu, C., Ibrahim, K.Z., Srinivasan, J.: Scaling spark on hpc systems. In: Proceedings of the International Conference on High-Performance Parallel and Distributed Computing (2015)
Davidson, A., Or, A.: Optimizing shuffle performance in spark. http://www.cs.berkeley.edu/kubitron/courses/cs262a-F13/projects/reports/project16_report.pdf
Franklin, M.: Making sense of big data with the berkeley data analytics stack. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM 2015, pp. 1–2. ACM, New York (2015). http://doi.acm.org/10.1145/2684822.2685326
Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: Graphx: graph processing in a distributed dataflow framework. In: Proceedings of OSDI, pp. 599–613. https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-gonzalez.pdf
Jacobsen, D.M., Canon, R.S.: Contain this, unleashing docker for hpc. In: Cray Users Group (2015)
Maschhoff, K.J., Ringenburg, M.F.: Experiences running and optimizing the berkeley data analytics stack on cray platforms. In: Cray Users Group (2015)
Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D.B., Amde, M., Owen, S., Xin, D., Xin, R., Franklin, M.J., Zadeh, R., Zaharia, M., Talwalkar, A.: MLlib: machine learning in apache spark. http://arxiv.org/abs/1505.06807
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10 (2010)
Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E.: Apache hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC 2013, pp. 5:1–5:16. ACM (2013). http://doi.acm.org/10.1145/2523616.2523633
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI 2012, p. 2. USENIX Association (2012). http://dl.acm.org/citation.cfm?id=2228298.2228301
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot topics in Cloud Computing, vol. 10, p. 10. http://static.usenix.org/legacy/events/hotcloud10/tech/full_papers/Zaharia.pdf
Zaharia, M., Das, T., Li, H., Shenker, S., Stoica, I.: Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In: Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Ccomputing, HotCloud 2012, pp. 10–10. USENIX Association, Berkeley (2012). http://dl.acm.org/citation.cfm?id=2342763.2342773
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Chaimov, N., Malony, A., Iancu, C., Ibrahim, K. (2016). Scaling Spark on Lustre. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_45
Download citation
DOI: https://doi.org/10.1007/978-3-319-46079-6_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46078-9
Online ISBN: 978-3-319-46079-6
eBook Packages: Computer ScienceComputer Science (R0)