Skip to main content
Log in

Meddal: meeting deadlines and data locality via bin packing in cloud environment

  • Regular Paper
  • Published:
Computing Aims and scope Submit manuscript

Abstract

Cloud computing has become one of the most popular frameworks for big data processing and analysis. To have better performance, the data chunks and their corresponding process nodes must be near each other, which is known as data locality. Moreover, to meet the deadline constraints, the available processing power next to the data locations is crucial. Considering the time and resource constraints, designing an appropriate job scheduler is essential for the effective management of memory and processing capacity. These concerns coalesce into the problem of joint data placement and job scheduling with data locality and deadline constraints. In this paper, two kinds of job sets are determined, one with equal deadlines and the other one with various deadlines. The problem of joint data placement and job scheduling with a single deadline is formulated as the problem of bin packing with splittable items and cardinality constraints, which is strongly NP-hard. Moreover, this paper proposes the polynomial-time Meddal algorithm to find a feasible solution with a few processing nodes. Experiment results have shown that the proposed method is effective, with the number of required nodes reduced by \(33\%\) when compared to the algorithms Cred and First Fit.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability

The datasets generated and analyzed during the current study are available in the IUST cloud repository, https://drive.iust.ac.ir/index.php/s/7MCtZXgnpcnbQ5CMeddalData.

References

  1. Puthal D, Sahoo B, Mishra S, Swain S (2015) Cloud computing features, issues, and challenges: a big picture. IEEE, pp 116–123

  2. Josep AD et al (2010) A view of cloud computing. Commun ACM 53(4):50

    Article  Google Scholar 

  3. Mell P, Grance T (2011) The NIST definition of cloud computing. Technical Reports 800-145, National Institute of Standards and Technology (NIST). http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf

  4. Xu M, Alamro S, Lan T, Subramaniam S (2017) CRED: cloud right-sizing with execution deadlines and data locality. IEEE Trans Parallel Distrib Syst 28(12):3389–3400

    Article  Google Scholar 

  5. Shi S, Wu C, Li Z (2015) Cost-minimizing online VM purchasing for application service providers with arbitrary demands. pp 146–154

  6. Tarafdar A, Debnath M, Khatua S, Das RK (2021) Energy and makespan aware scheduling of deadline sensitive tasks in the cloud environment. J Grid Comput 19(2):1–25

    Article  Google Scholar 

  7. Singh S, Chana I (2016) A survey on resource scheduling in cloud computing: issues and challenges. J Grid Comput 14(2):217–264

    Article  Google Scholar 

  8. Zhan ZH et al (2015) Cloud computing resource scheduling and a survey of its evolutionary approaches. ACM Comput Surv 47(4):63:1-63:33

    Article  Google Scholar 

  9. Magoulès F, Pan J, Teng F (2016) Cloud computing: lata-intensive computing and scheduling. Chapman and Hall/CRC

    Book  Google Scholar 

  10. Ma X, Fan X, Liu J, Jiang H, Peng K (2017) vLocality: revisiting data locality for MapReduce in virtualized clouds. IEEE Netw 31(1):28–35

    Article  Google Scholar 

  11. Zou X, et al (2021) The dilemma between deduplication and locality: can both be achieved? pp 171–185

  12. Shabeera T, Madhu Kumar S (2015) Optimising virtual machine allocation in MapReduce cloud for improved data locality. Int J Big Data Intell 2(1):2–8

    Article  Google Scholar 

  13. Lee S, Jo J, Kim Y (2019) Survey of data locality in apache hadoop. IEEE, pp 46–53

  14. Palanisamy B, Singh A, Liu L, Jain B, Lathrop SA, Costa J, Kramer W (2011) Purlieus: locality-aware resource allocation for MapReduce in a cloud. In: Lathrop SA, Costa J, Kramer W (eds.), Proceedings of conference on high performance computing networking, storage and analysis, ACM, Seattle, November 12–18 2011, pp 58:1–58:11

  15. Tang S, Lee B, He B (2014) DynamicMR: a dynamic slot allocation optimization framework for MapReduce clusters. IEEE Trans Cloud Comput 2(3):333–347

    Article  Google Scholar 

  16. Naik NS, Negi A, Br TB, Anitha R (2019) A data locality based scheduler to enhance MapReduce performance in heterogeneous environments. Future Gener Comput Syst 90:423–434

    Article  Google Scholar 

  17. Ru J, Yang Y, Grundy J, Keung J, Hao L (2019) A highly efficient data locality aware task scheduler for cloud-based systems. IEEE, pp 496–498

  18. Jalalian Z, Sharifi M (2021) A hierarchical multi-objective task scheduling approach for fast big data processing. J Supercomput 78:1–30

    Google Scholar 

  19. Bittencourt L et al (2018) The internet of things, fog and cloud continuum: integration and challenges. Internet Things 3:134

    Article  Google Scholar 

  20. Li S, Lan T, Ra M-R, Panta R (2018) Joint scheduling and source selection for background traffic in erasure-coded storage. IEEE Trans Parallel Distrib Syst 29(12):2826–2837

    Article  Google Scholar 

  21. Epstein L, van Stee R (2011) Improved results for a memory allocation problem. Theory Comput Syst 48(1):79–92

    Article  MATH  Google Scholar 

  22. Garey MR, Johnson DS (1978) “Strong’’ NP-completeness results: motivation, examples, and implications. J ACM 25(3):499–508

    Article  MATH  Google Scholar 

  23. Li C, Cai Q, Lou Y (2022) Optimal data placement strategy considering capacity limitation and load balancing in geographically distributed cloud. Future Gener Comput Syst 127:142–159

    Article  Google Scholar 

  24. Ru J, Yang Y, Grundy J, Keung J, Hao L (2021) An efficient deadline constrained and data locality aware dynamic scheduling framework for multitenancy clouds. Concurr Comput Pract Exp 33(5):e6037

    Article  Google Scholar 

  25. Swain CK, Gupta B, Sahu A (2020) Constraint aware profit maximization scheduling of tasks in heterogeneous datacenters. Computing 102(10):2229–2255

    Article  MATH  Google Scholar 

  26. Li C et al (2019) Data locality optimization based on data migration and hotspots prediction in geo-distributed cloud environment. Knowl Based Syst 165:321–334

    Article  Google Scholar 

  27. Xu M, Alamro S, Lan T, Subramaniam S (2018) Chronos: a unifying optimization framework for speculative execution of deadline-critical MapReduce jobs. IEEE, pp 718–729

  28. Li C, Liu J, Wang M, Luo Y (2022) Fault-tolerant scheduling and data placement for scientific workflow processing in geo-distributed clouds. J Syst Softw 187:111227

    Article  Google Scholar 

  29. Kang Y, Pan L, Liu S (2022) Job scheduling for big data analytical applications in clouds: a taxonomy study. Future Gener Comput Syst 135:129

    Article  Google Scholar 

  30. Albers S, Quedenfeld J (2018) Optimal algorithms for right-sizing data centers

  31. Alamro S, Xu M, Lan T, Subramaniam S (2020) Shed+: optimal dynamic speculation to meet application deadlines in cloud. IEEE Trans Netw Serv Manag 17(3):1515–1526

    Article  Google Scholar 

  32. Toosi AN, Sinnott RO, Buyya R (2018) Resource provisioning for data-intensive applications with deadline constraints on hybrid clouds using Aneka. Future Gener Comput Syst 79:765–775

    Article  Google Scholar 

  33. Yuan H et al (2017) TTSA: an effective scheduling approach for delay bounded tasks in hybrid clouds. IEEE Trans Cybern 47(11):3658–3668

    Article  Google Scholar 

  34. Kaur K, Kumar N, Garg S, Rodrigues JJPC (2018) EnLoc: data locality-aware energy-efficient scheduling scheme for cloud data centers. pp 1–6

  35. Lee JH, Jang H, Kim HJ (2020) Iterative job splitting algorithms for parallel machine scheduling with job splitting and setup resource constraints. J Oper Res Soc 72:1–20

    Google Scholar 

  36. Epstein L, Levin A, van Stee R (2012) Approximation schemes for packing splittable items with cardinality constraints. Algorithmica 62(1–2):102–129

    Article  MATH  Google Scholar 

  37. LeCun B, Mautor T, Quessette F, Weisser M-A (2015) Bin packing with fragmentable items: presentation and approximations. Theor Comput Sci 602:50–59

    Article  MATH  Google Scholar 

  38. Grégoire JC, Hamel AM (2014) On scheduling live media streaming in the cloud—a study. IEEE, pp 1–6

  39. Casazza M, Ceselli A (2016) Exactly solving packing problems with fragmentation. Comput Oper Res 75:202–213

    Article  MATH  Google Scholar 

  40. Beaumont O, Eyraud-Dubois L, Caro CT, Rejeb H (2013) Heterogeneous resource allocation under degree constraints. IEEE Trans Parallel Distrib Syst 24(5):926–937

    Article  Google Scholar 

  41. Jaykrishnan G, Levin A (2022) EPTAS for the dual of splittable bin packing with cardinality constraint. arXiv preprint arXiv:2204.04685

  42. Fleszar K (2022) A MILP model and two heuristics for the bin packing problem with conflicts and item fragmentation. Eur J Oper Res 303:37

    Article  MATH  Google Scholar 

  43. Ekici A (2022) Variable-sized bin packing problem with conflicts and item fragmentation. Comput Ind Eng 163:107844

    Article  Google Scholar 

  44. Ekici A (2021) Bin packing problem with conflicts and item fragmentation. Comput Oper Res 126:105113

    Article  MATH  Google Scholar 

  45. Jalaparti V et al (2015) Network-aware scheduling for data-parallel jobs: plan when you can. Comput Commun Rev 45(5):407–420

    Article  Google Scholar 

  46. Reiss C, Wilkes J, Hellerstein JL (2011) Google cluster-usage traces: format + schema. Technical Report, Google Inc., Mountain View, CA, USA. Revised 2014-11-17 for version 2.1. Posted at https://github.com/google/cluster-data

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marzieh Malekimajd.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Malekimajd, M. Meddal: meeting deadlines and data locality via bin packing in cloud environment. Computing 105, 249–273 (2023). https://doi.org/10.1007/s00607-022-01122-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-022-01122-0

Keywords

Mathematics Subject Classification

Navigation