Abstract
Achieving the minimum average coflow completion time(CCT) and the isolation guarantees for multi-tenant, is considered a challenge in a cloud environment. This is because the minimum average CCT and isolation guarantees are two conflicting targets, and they cannot be achieved simultaneously. Prior solutions have implemented a single target either minimizing the average CCT or isolation guarantees. The prior solutions are also limited to clairvoyant scheduling. They also assume the availability of the complete knowledge of coflow sizes before the communication starts. In this paper, we propose an efficient scheduling algorithm smallest-height-first DRF(SHFDRF) for near-optimal scheduling and isolation guarantees without prior knowledge of coflow size. SHFDRF achieves the long-term isolation guarantees and the minimum average CCT by the smallest height first and the monopolistic dominant resource fairness bandwidth allocation strategy. The smallest height first and the monopolistic dominant resource fairness bandwidth allocation strategy can also improve link utilization and system throughput. The trace-driven simulation shows that SHFDRF enables communication stages to 1.28\(\times \), 2.27\(\times \), and 6.28\(\times \) faster on the 95th percentile compared to DRF, NCDRF, and Per-Flow Fairness. Even compared with minimum CCT, the completion time of coflow only slowed down by 13.9% on the 95th percentile. Overall, the performance of SHFDRF is acceptable, and it can be applied to the actual datacenter without the limitation of complete prior knowledge.
Similar content being viewed by others
References
Coflow benchmark based on facebook traces (2018). https://github.com/coflow/coflow-benchmark
Alizadeh M, Greenberg A, Maltz DA, Padhye J, Patel P, Prabhakar B, Sengupta S, Sridharan M (2011) Data center TCP (DCTCP). ACM SIGCOMM Computer Commun Rev 41(4):63–74. https://doi.org/10.1145/1851182.1851192
Bai W, Chen L, Chen K, Han D, Tian C, Wang H (2017) PIAS: Practical information-agnostic flow scheduling for commodity data centers. IEEE/ACM Trans Netw 25(4):1954–1967. https://doi.org/10.1109/TNET.2017.2669216
Ballani H, Costa P, Karagiannis T, Rowstron A (2011) Towards predictable datacenter networks. In: Proceedings of the ACM SIGCOMM 2011 Conference on SIGCOMM - SIGCOMM ’11, vol. 41, pp. 242–253. ACM Press, Toronto, Ontario, Canada. https://doi.org/10.1145/2018436.2018465
Bonald T, Roberts J (2014) Enhanced cluster computing performance through proportional fairness. Perform Eval 79:134–145. https://doi.org/10.1016/j.peva.2014.07.009
Chen Y, Wu J (2018) Multi-hop coflow routing and scheduling in data centers. In: 2018 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE, Kansas City, MO. https://doi.org/10.1109/ICC.2018.8422880
Chowdhury M, Liu Z, Ghodsi A, Stoica I (2016) HUG: multi-resource fairness for correlated and elastic demands. 13th USENIX Symposium on networked systems design and implementation (NSDI 16). USENIX, Santa Clara, California, pp 407–424
Chowdhury M, Stoica I (2012) Coflow: an application layer abstraction for cluster networking. In: Proceedings of the 11th ACM workshop on hot topics in networks - HotNets-XI, pp. 1–6. ACM Press. https://doi.org/10.1145/2390231.2390237
Chowdhury M, Stoica I (2015) Efficient coflow scheduling without prior knowledge. In: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication - SIGCOMM ’15, pp. 393–406. ACM Press, London, United Kingdom. https://doi.org/10.1145/2785956.2787480
Chowdhury M, Zaharia M, Ma J, Jordan MI, Stoica I (2011) Managing Data Transfers in Computer Clusters with Orchestra. ACM SIGCOMM Computer Commun Rev 41(4):98–109. https://doi.org/10.1145/2043164.2018448
Chowdhury M, Zhong Y, Stoica I (2014) Efficient coflow scheduling with varys. In: Proceedings of the 2014 ACM Conference on SIGCOMM - SIGCOMM ’14, pp. 443–454. ACM Press, Chicago, Illinois, USA. https://doi.org/10.1145/2619239.2626315
Chowdhury NMMK (2015) Coflow a networking abstraction for distributed data-parallel applications. University of California, Berkeley
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113. https://doi.org/10.1145/1327452.1327492
Dogar FR, Karagiannis T, Ballani H, Rowstron A (2014) Decentralized task-aware scheduling for data center networks. In: Proceedings of the 2014 ACM Conference on SIGCOMM - SIGCOMM ’14, pp. 431–442. ACM Press, Chicago, Illinois, USA. https://doi.org/10.1145/2619239.2626322
Ghodsi A, Zaharia M, Hindman B, Konwinski A, Shenker S, Stoica I (2011) Dominant resource fairness: fair allocation of multiple resource types. In: 8th USENIX Symposium on networked systems design and implementation (NSDI ’11), vol 11. USENIX, Boston, MA, pp 323–336
Ghodsi A, Zaharia M, Shenker S, Stoica I (2013) Choosy: max-min fair sharing for datacenter jobs with constraints. In: Proceedings of the 8th ACM European Conference on Computer Systems - EuroSys ’13, pp. 365–378. ACM Press, Prague, Czech Republic. https://doi.org/10.1145/2465351.2465387
Guo C, Lu G, Wang HJ, Yang S, Kong C, Sun P, Wu W, Zhang Y (2010) SecondNet: a data center network virtualization architecture with bandwidth guarantees. In: Proceedings of the 6th International Conference on - Co-NEXT ’10. ACM Press, Philadelphia, USA. https://doi.org/10.1145/1921168.1921188
Guo Y, Wang Z, Zhang H, Yin X, Shi X, Wu J (2019) Joint optimization of tasks placement and routing to minimize coflow completion time. J Netw Computer Appl 135:47–61. https://doi.org/10.1016/j.jnca.2019.02.031
Hong CY, Caesar M, Godfrey PB (2012) Finishing flows quickly with preemptive scheduling. In: Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication - SIGCOMM ’12, pp. 127–138. ACM Press, Helsinki, Finland. https://doi.org/10.1145/2342356.2342389
Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. In: ACM SIGOPS operating systems review, vol. 41, pp. 59–72. ACM Press, Lisboa, Portugal. https://doi.org/10.1145/1272996.1273005
Jajoo A, Hu YC, Lin X (2019) Your coflow has many flows: sampling them for fun and speed. In: Your coflow has many flows: sampling them for fun and speed. USENIX, RENTON, WA, USA, pp 833–847
Jeyakumar V, Alizadeh M, Mazieres D, Prabhakar B, Kim C, Greenberg A (2013) EyeQ: Practical network performance isolation at the edge. In: 10th USENIX Symposium on networked systems design and implementation (NSDI ’13). USENIX, Lombard, IL, pp 297–311
Jiang D, Xu Z, Liu J, Zhao W (2016) An optimization-based robust routing algorithm to energy-efficient networks for cloud computing. Telecommun Syst 63(1):89–98. https://doi.org/10.1007/s11235-015-9975-y
Li C, Zhang H, Zhou T (2019) Coflow scheduling algorithm based density peaks clustering. Future Gener Computer Syst 97:805–813. https://doi.org/10.1016/j.future.2019.03.035
Nagelkerke NJD (1991) A note on a general definition of the coefficient of determination. Biometrika 78(3):691–692. https://doi.org/10.2307/2337038
Popa L, Kumar G, Chowdhury M, Krishnamurthy A, Ratnasamy S, Stoica I (2012) FairCloud: sharing the network in cloud computing. ACM SIGCOMM Computer Commun Rev 42(4):187–198. https://doi.org/10.1145/2377677.2377717
Poullie P, Bocek T, Stiller B (2018) A survey of the state-of-the-art in fair multi-resource allocations for data centers. IEEE Trans Netw Serv Manag 15(1):169–183. https://doi.org/10.1109/TNSM.2017.2743066
Shafiee M, Ghaderi J (2018) An improved bound for minimizing the total weighted completion time of coflows in datacenters. IEEE/ACM Trans Netw 26(4):1674–1687. https://doi.org/10.1109/TNET.2018.2845852
Shi L, Zhang J, Liu Y, Robertazzi T (2018) Coflow scheduling in data centers: routing and bandwidth allocation. arXiv:1812.06898[cs]
Singh A, Ong J, Agarwal A, Anderson G, Armistead A, Bannon R, Boving S, Desai G, Felderman B, Germano P, Kanagala A, Provost J, Simmons J, Tanda E, Wanderer J, Hölzle U, Stuart S, Vahdat A (2015) Jupiter rising: a decade of clos topologies and centralized control in google’s datacenter network. In: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication - SIGCOMM ’15, vol. 45, pp. 183–197. ACM Press, London, United Kingdom. https://doi.org/10.1145/2785956.2787508
Wang L, Wang W (2018) Fair coflow scheduling without prior knowledge. In: 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pp. 22–32. IEEE, Vienna . https://doi.org/10.1109/ICDCS.2018.00013
Wang L, Wang W, Li B (2018) Utopia: near-optimal coflow scheduling with isolation guarantee. In: IEEE INFOCOM 2018 - IEEE Conference on Computer Communications, pp. 891–899. IEEE, Honolulu, HI. https://doi.org/10.1109/INFOCOM.2018.8485970
Wang S, Zhang J, Huang T, Pan T, Liu J, Liu Y (2018) Multi-attributes-based coflow scheduling without prior knowledge. IEEE/ACM Trans Netw 26(4):1962–1975. https://doi.org/10.1109/TNET.2018.2858801
Wang W, Jin AL (2016) Friends or foes: revisiting strategy-proofness in cloud network sharing. In: 2016 IEEE 24th International Conference on Network Protocols (ICNP), pp. 1–10. IEEE, Singapore. https://doi.org/10.1109/ICNP.2016.7784425
Wang W, Ma S, Li B, Li B (2017) Coflex: Navigating the fairness-efficiency tradeoff for coflow scheduling. In: IEEE INFOCOM 2017 - IEEE Conference on Computer Communications, pp. 1–9. IEEE, Atlanta, GA, USA. https://doi.org/10.1109/INFOCOM.2017.8057172
Wang Z, Zhang H, Shi X, Yin X, Li Y, Geng H, Wu Q, Liu J (2019) Efficient scheduling of weighted coflows in data centers. IEEE Trans Parallel Distrib Syst 30(9):2003–2017. https://doi.org/10.1109/TPDS.2019.2905560
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing., vol. 10, pp. 10–17. Boston, MA
Zhang H, Chen L, Yi B, Chen K, Chowdhury M., Geng Y (2016) CODA: toward automatically identifying and scheduling coflows in the dark. In: Proceedings of the 2016 Conference on ACM SIGCOMM 2016 Conference - SIGCOMM ’16, pp. 160–173. ACM Press, Florianopolis, Brazil . https://doi.org/10.1145/2934872.2934880
Zhang H, Shi X, Yin X, Wang Z (2017) Yosemite: efficient scheduling of weighted coflows in data centers. In: 2017 IEEE 25th International Conference on Network Protocols (ICNP), pp. 1–2. IEEE, Toronto, ON. https://doi.org/10.1109/ICNP.2017.8117586
Zhao Y, Chen K, Bai W, Yu M, Tian C, Geng Y, Zhang Y, Li D, Wang S (2015) Rapier: integrating routing and scheduling for coflow-aware data center networks. In: 2015 IEEE Conference on Computer Communications (INFOCOM), pp. 424–432. IEEE, Kowloon, Hong Kong . https://doi.org/10.1109/INFOCOM.2015.7218408
Acknowledgements
This study was supported by the National Natural Science Foundation of China (Grant No. 61772386).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, C., Zhang, H., Ding, W. et al. Fair and near-optimal coflow scheduling without prior knowledge of coflow size. J Supercomput 77, 7690–7717 (2021). https://doi.org/10.1007/s11227-020-03614-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03614-2