Skip to main content

Throughput Analytics of Data Transfer Infrastructures

  • Conference paper
  • First Online:
Testbeds and Research Infrastructures for the Development of Networks and Communities (TridentCom 2018)

Abstract

To support increasingly distributed scientific and big-data applications, powerful data transfer infrastructures are being built with dedicated networks and software frameworks customized to distributed file systems and data transfer nodes. The data transfer performance of such infrastructures critically depends on the combined choices of file, disk, and host systems as well as network protocols and file transfer software, all of which may vary across sites. The randomness of throughput measurements makes it challenging to assess the impact of these choices on the performance of infrastructure or its parts. We propose regression-based throughput profiles by aggregating measurements from sites of the infrastructure, with RTT as the independent variable. The peak values and convex-concave shape of a profile together determine the overall throughput performance of memory and file transfers, and its variations show the performance differences among the sites. We then present projection and difference operators, and coefficients of throughput profiles to characterize the performance of infrastructure and its parts, including sites and file transfer tools. In particular, the utilization-concavity coefficient provides a value in the range [0, 1] that reflects overall transfer effectiveness. We present results of measurements collected using (i) testbed experiments over dedicated 0–366 ms 10 Gbps connections with combinations of TCP versions, file systems, host systems and transfer tools, and (ii) Globus GridFTP transfers over production infrastructure with varying site configurations.

This work is funded by RAMSES project and the Applied Mathematics Program, Office of Advanced Computing Research, U.S. Department of Energy, and by Extreme Scale Systems Center, sponsored by U. S. Department of Defense, and performed at Oak Ridge National Laboratory managed by UT-Battelle, LLC for U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 60.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Iozone file system benchmark (2018). http://www.iozone.org. Accessed 28 Mar 2018

  2. Energy Science Network Data Transfer Nodes. https://fasterdata.es.net/performance-testing/DTNs/. Accessed 28 Mar 2018

  3. Allcock, W., et al.: The Globus striped GridFTP framework and server. In: ACM/IEEE Conference on Supercomputing, pp. 54–64. IEEE Computer Society, Washington, D.C. (2005)

    Google Scholar 

  4. Allen, B., et al.: Software as a service for data scientists. Commun. ACM 55(2), 81–88 (2012)

    Article  Google Scholar 

  5. Arslan, E., Kosar, T.: High speed transfer optimization based on historical analysis and real-time tuning. IEEE Trans. Parallel Distrib. Syst. 29, 1303–1316 (2018)

    Article  Google Scholar 

  6. Aspera Transfer Service. http://asperasoft.com. Accessed 28 Mar 2018

  7. Cardwell, N., Cheng, Y., Gunn, C.S., Yeganeh, S.H., Jacobson, V.: BBR: congestion based congestion control. ACM Queue 14(5), 50 (2016)

    Google Scholar 

  8. Chard, K., Dart, E., Foster, I., Shifflett, D., Tuecke, S.J., Williams, J.: The modern research data portal: a design pattern for networked, data-intensive science. Peer J. Comput. Sci. 4(6), e144 (2018)

    Article  Google Scholar 

  9. General Parallel File System. https://www.ibm.com/support/knowledgecenter/en/SSFKCN/gpfs_welcome.html

  10. Gu, Y., Grossman, R.L.: UDT: UDP-based data transfer for high-speed wide area networks. Comput. Netw. 51(7), 1777–1799 (2007)

    Article  Google Scholar 

  11. Habib, S., Morozov, V., Frontiere, N., Finkel, H., Pope, A., Heitmann, K.: HACC: extreme scaling and performance across diverse architectures. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013, pp. 6:1–6:10. ACM, New York (2013)

    Google Scholar 

  12. Hacker, T.J., Athey, B.D., Noble, B.: The end-to-end performance effects of parallel TCP sockets on a lossy wide-area network. In: 16th International Parallel and Distributed Processing Symposium (2002)

    Google Scholar 

  13. Henschel, R., et al.: Demonstrating Lustre over a 100 Gbps wide area network of 3,500 km. In: International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–8, November 2012

    Google Scholar 

  14. https://iperf.fr/. iPerf - the ultimate speed test tool for TCP, UDP and SCTPs (2018). https://iperf.fr/. Accessed 28 Mar 2018

  15. Jain, S., et al.: B4: experience with a globally-deployed software defined WAN. SIGCOMM Comput. Commun. Rev. 43(4), 3–14 (2013)

    Article  Google Scholar 

  16. Kettimuthu, R., Liu, Z., Wheelerd, D., Foster, I., Heitmann, K., Cappello, F.: Transferring a petabyte in a day. In: 4th International Workshop on Innovating the Network for Data Intensive Science, p. 10, November 2017

    Google Scholar 

  17. Liu, Q., Rao, N.S.V.: On concavity and utilization analytics of wide-area network transport protocols. In: Proceedings of the 20th IEEE Conference on High Performance Computing and Communications (HPCC), Exeter, UK, June 2018

    Google Scholar 

  18. Liu, Q., Rao, N.S.V., Wu, C.Q., Yun, D., Kettimuthu, R., Foster, I.: Measurement-based performance profiles and dynamics of UDT over dedicated connections. In: International Conference on Network Protocols, Singapore, November 2016

    Google Scholar 

  19. Liu, Z., Balaprakash, P., Kettimuthu, R., Foster, I.: Explaining wide area data transfer performance. In: 26th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2017, pp. 167–178. ACM, New York (2017)

    Google Scholar 

  20. Liu, Z., Kettimuthu, R., Foster, I., Beckman, P.H.: Towards a smart data transfer node. In: 4th International Workshop on Innovating the Network for Data Intensive Science, p. 10, November 2017

    Article  Google Scholar 

  21. Liu, Z., Kettimuthu, R., Leyffer, S., Palkar, P., Foster, I.: A mathematical programming - and simulation-based framework to evaluate cyberinfrastructure design choices. In: IEEE 13th International Conference on e-Science, p. 148–157, October 2017

    Google Scholar 

  22. Lustre Basics. https://www.olcf.ornl.gov/kb_articles/lustre-basics

  23. Mathis, M., Semke, J., Mahdavi, J., Ott, T.: The mascroscopic behavior of the TCP congestion avoidance algorithm. Comput. Commun. Rev. 27(3), 67–82 (1997)

    Article  Google Scholar 

  24. Matsunaga, H., Isobe, T., Mashimo, T., Sakamoto, H., Ueda, I.: Data transfer over the wide area network with a large round trip time. J. Phys.: Conf. Ser. 219(6), 062056 (2010)

    Google Scholar 

  25. Multi-core aware data transfer middleware. mdtm.fnal.gov. Accessed 28 Mar 2018

  26. Michael, S., Zhen, L., Henschel, R., Simms, S., Barton, E., Link, M.: A study of Lustre networking over a 100 gigabit wide area network with 50 milliseconds of latency. In: 5th International Workshop on Data-Intensive Distributed Computing, pp. 43–52 (2012)

    Google Scholar 

  27. On-demand Secure Circuits and Advance Reservation System. http://www.es.net/oscars

  28. Rao, N.S.V., Imam, N., Hanley, J., Sarp, O.: Wide-area Lustre file system using LNet routers. In: 12th Annual IEEE International Systems Conference (2018)

    Google Scholar 

  29. Rao, N.S.V., et al.: TCP throughput profiles using measurements over dedicated connections. In: ACM Symposium on High-Performance Parallel and Distributed Computing, Washington, D.C., July–August 2017

    Google Scholar 

  30. Rao, N.S.V., et al.: Experimental analysis of file transfer rates over wide-area dedicated connections. In: 18th IEEE International Conference on High Performance Computing and Communications (HPCC), Sydney, Australia, pp. 198–205, December 2016

    Google Scholar 

  31. Rao, N.S.V., et al.: Experiments and analyses of data transfers over wide-area dedicated connections. In: 26th International Conference on Computer Communications and Network (2017)

    Google Scholar 

  32. Rhee, I., Xu, L.: CUBIC: a new TCP-friendly high-speed TCP variant. In: 3rd International Workshop on Protocols for Fast Long-Distance Networks (2005)

    Google Scholar 

  33. Settlemyer, B.W., Dobson, J.D., Hodson, S.W., Kuehn, J.A., Poole, S.W., Ruwart, T.M.: A technique for moving large data sets over high-performance long distance networks. In: IEEE 27th Symposium on Mass Storage Systems and Technologies, pp. 1–6, May 2011

    Google Scholar 

  34. Shorten, R.N., Leith, D.J.: H-TCP: TCP for high-speed and long-distance networks. In: 3rd International Workshop on Protocols for Fast Long-Distance Networks (2004)

    Google Scholar 

  35. Srikant, Y., Ying, L.: Communication Networks: An Optimization, Control, and Stochastic Networks Perspective. Cambridge University Press, Cambridge (2014)

    MATH  Google Scholar 

  36. XDD - The eXtreme dd toolset. https://github.com/bws/xdd. Accessed 28 Mar 2018

  37. XFS. http://xfs.org

  38. Yildirim, E., Arslan, E., Kim, J., Kosar, T.: Application-level optimization of big data transfers through pipelining, parallelism and concurrency. IEEE Trans. Cloud Comput. 4(1), 63–75 (2016)

    Article  Google Scholar 

  39. Yildirim, E., Yin, D., Kosar, T.: Prediction of optimal parallelism level in wide area data transfers. IEEE Trans. Parallel Distrib. Syst. 22(12), 2033–2045 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nageswara S. V. Rao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rao, N.S.V., Liu, Q., Liu, Z., Kettimuthu, R., Foster, I. (2019). Throughput Analytics of Data Transfer Infrastructures. In: Gao, H., Yin, Y., Yang, X., Miao, H. (eds) Testbeds and Research Infrastructures for the Development of Networks and Communities. TridentCom 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 270. Springer, Cham. https://doi.org/10.1007/978-3-030-12971-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-12971-2_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-12970-5

  • Online ISBN: 978-3-030-12971-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics