Abstract
Improving the transmission efficiency for small files over a wide area network is always challenging. Time may be wasted when waiting for transmission commands due to the design of transfer protocols, which in turn increases the Round-trip time (RTT). GridFTP is widely deployed as a transfer protocol in the grid era, where a concept of pipelining is proposed to improve the transmission efficiency for small files. Based on the GridFTP protocol, we design a smart data structure to classify files and propose a corresponding scheduling algorithm to tune the pipelining parameters, making them more reasonable and adaptive to different transmission scenarios. Bandwidth usage is optimized when a large number of small files are transferred with our strategy by combining the optimal pipelining and concurrency parameters. A method to optimizing the throughput for high-priority file transfer is also proposed. By adjusting the pipelining parameter dynamically, the throughput is increased by almost 10% compared with other methods. Moreover, our method achieves better performance even with a smaller concurrency setting. The favorable throughput is maintained when transferring high-priority files.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Altschul, S.F., Gish, W., Miller, W., et al.: Basic local alignment search tool. Journal of molecular biology 215(3), 403–410 (1990)
Mackey, G., Sehrish, S., Wang, J.: Improving metadata management for small files in HDFS. In: IEEE International Conference on Cluster Computing and Workshops, IEEE, pp. 1–4 (2009)
Wang, F.: WMO information system: Beijing global information system center. Bull. Am. Meteorol. Soc. 94(7), 991–994 (2013)
Bresnahan, J., Link, M., Kettimuthu, R., et al.: Gridftp pipelining. In: Proceedings of the 2007 TeraGrid Conference (2007)
Allcock, W.: GridFTP: protocol extensions to FTP for the Grid. http://www.ggf.org/documents/GFD.20.pdf(2003)
Bresnahan, J., Link, M., Khanna, G., et al.: Globus GridFTP: what’s new. In: Proceedings of the First International Conference on Networks for Grid Applications, pp. 1–5 (2007)
Allcock, W., Bresnahan, J., Kettimuthu, R., et al.: The globus striped GridFTP framework and server. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC 2005. IEEE, pp. 54–54 (2005)
Foster, I.: Globus toolkit version 4: software for service-oriented systems. J. Comput. Sci. Technol. 21(4), 513–520 (2006)
Postel, J., Reynolds, J.: File transfer protocol (1985)
Liu, Y., Liu, Z., Kettimuthu, R., et al.: Data transfer between scientific facilities–bottleneck analysis, insights and optimizations. In: 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp. 122–131. IEEE (2019)
Ito, T., Ohsaki, H., Imase, M.: On parameter tuning of data transfer protocol GridFTP for wide-area networks. Connections 3, 9 (2008)
Choi, K.M., Huh, E.-N., Choo, H.: Efficient resource management scheme of TCP buffer tuned parallel stream to optimize system performance. In: Enokido, T., Yan, L., Xiao, B., Kim, D., Dai, Y., Yang, L.T. (eds.) EUC 2005. LNCS, vol. 3823, pp. 683–692. Springer, Heidelberg (2005). https://doi.org/10.1007/11596042_71
Data Intensive Distributed Computing: Challenges and Solutions for Large-scale Information Management: Challenges and Solutions for Large-scale Information Management. IGI Global, Hershey (2012)
Kosar, T., Balman, M., Yildirim, E., et al.: Stork data scheduler: mitigating the data bottleneck in e-science. Phil. Trans. R. Soc. Math. Phys. Eng. Sci. 2011(369), 3254–3267 (1949)
Hacker, T.J., Athey, B.D., Noble, B.: The end-to-end performance effects of parallel TCP sockets on a lossy wide-area network. In: Proceedings 16th International Parallel and Distributed Processing Symposium, 10p. IEEE (2002)
Lu, D., Qiao, Y., Dinda, P.A., et al.: Modeling and taming parallel TCP on the wide area network. In: 19th IEEE International Parallel and Distributed Processing Symposium, 10 p. IEEE (2005)
Yildirim, E., Balman, M., Kosar, T.: Dynamically tuning level of parallelism in wide area data transfers. In: Proceedings of the 2008 International Workshop on Data-Aware Distributed Computing, pp. 39–48 (2008)
Allen, B., Bresnahan, J., Childers, L., et al.: Software as a service for data scientists. Commun. ACM 55(2), 81–88 (2012)
Kim, J.: Tuning GridFTP pipelining, concurrency and parallelism based on historical data. IEICE Trans. Inf. Syst. 97(11), 2963–2966 (2014)
Yildirim, E., Arslan, E., Kim, J., et al.: Application-level optimization of big data transfers through pipelining, parallelism and concurrency. IEEE Tran. Cloud Comput. 4(1), 63–75 (2015)
Yildirim, E., Kim, J., Kosar, T.: Optimizing the sample size for a cloud-hosted data scheduling service. In: Proceedings of the 2nd International Workshop on Cloud Computing Science Application (2012)
Cardwell, N., Savage, S., Anderson, T.: Modeling the performance of short TCP connections. Techical Report (1998)
Acknowledgement
This work is supported by the National key R&D Program of China under Grant 2018YFB0203902, the National Natural Science Foundation of China under Grant No. 61972364; and the Fundamental Research Funds for the Central Universities under Grant No. 2652021001.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Wu, S., Sun, D., Gao, S., Zhang, G. (2022). End-to-End Dynamic Pipelining Tuning Strategy for Small Files Transfer. In: Xiang, W., Han, F., Phan, T.K. (eds) Broadband Communications, Networks, and Systems. BROADNETS 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 413. Springer, Cham. https://doi.org/10.1007/978-3-030-93479-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-93479-8_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93478-1
Online ISBN: 978-3-030-93479-8
eBook Packages: Computer ScienceComputer Science (R0)