Skip to main content

Workflow Scheduling Techniques for Big Data Platforms

  • Chapter
  • First Online:
Resource Management for Big Data Platforms

Part of the book series: Computer Communications and Networks ((CCN))

  • 1511 Accesses

Abstract

Many applications in scientific fields, like physics, astronomy, biology, earth science, involve the process of transforming a set of data by applying iterative computation steps. From the computer science perspective these steps may be seen as a pool of tasks with data dependency. With the growth of the application complexity there will also be an increase in the number of workflows. Since we have a large variety of solutions for specific applications and platforms, a systematic analysis of existing solutions for scheduling models, methods, and algorithms used in workflow applications is needed. This chapter provides a global picture of the existing solutions providing support in making the optimal workflow scheduling choices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.gartner.com/it-glossary/big-data/.

References

  1. Pop, F., Zhu, X., Yang, L.T.: Midhdc: Advanced topics on middleware services for heterogeneous distributed computing. part 1. Future Gener. Comput. Syst. 56, 734–735 (2016)

    Article  Google Scholar 

  2. Pop, F., Potop-Butucaru, M.: Armco: Advanced topics in resource management for ubiquitous cloud computing: An adaptive approach. Future Gener. Comput. Syst. 54, 79–81 (2016)

    Article  Google Scholar 

  3. Simion, B., Leordeanu, C., Pop, F., Cristea, V.: A hybrid algorithm for scheduling workflow applications in grid environments (icpdp). In: OTM Confederated International Conferences “On the Move to Meaningful Internet Systems”, pp. 1331–1348. Springer (2007)

    Google Scholar 

  4. Vasile, M.A., Pop, F., Tutueanu, R.I., Cristea, V., Kołodziej, J.: Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing. Future Gener. Comput. Syst. 51, 61–71 (2015)

    Article  Google Scholar 

  5. Lynch, C.: Big Data: How do your data grow? Nature 455(7209), 28–29 (2008)

    Article  Google Scholar 

  6. Pop, F., Iacono, M., Gribaudo, M., Kołodziej, J.: Advances in modelling and simulation for big-data applications (amsba). Concurrency Comput. Practice Experience 28(2), 291–293 (2016)

    Article  Google Scholar 

  7. Chen, M., Mao, S., Liu, Y.: Big Data: a survey. Mob. Networks Appl. 19(2), 171–209 (2014)

    Article  MathSciNet  Google Scholar 

  8. Erl, T., Khattak, W., Buhler, P.: Big Data Fundamentals: Concepts. Prentice Hall Press, Drivers & Techniques (2016)

    Google Scholar 

  9. Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-science: An overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25(5), 528–540 (2009)

    Article  Google Scholar 

  10. Muresan, O., Pop, F., Gorgan, D., Cristea, V.: Satellite image processing applications in mediogrid. In: 2006 Fifth International Symposium on Parallel and Distributed Computing, pp. 253–262. IEEE (2006)

    Google Scholar 

  11. Gorgan, D., Bacu, V., Rodila, D., Pop, F., Petcu, D.: Experiments on esipenvironment oriented satellite data processing platform. Earth Sci. Inf. 3(4), 297–308 (2010)

    Article  Google Scholar 

  12. Masdari, M., ValiKardan, S., Shahi, Z., Azar, S.I.: Towards workflow scheduling in cloud computing: a comprehensive analysis. J. Network Comput. Appl. 66, 64–82 (2016)

    Article  Google Scholar 

  13. Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M.: Workflows for e-Science: Scientific Workflows for Grids. Springer Publishing Company, Incorporated (2014)

    Google Scholar 

  14. Pop, F., Dobre, C., Cristea, V.: Performance analysis of grid dag scheduling algorithms using monarc simulation tool. In: 2008 International Symposium on Parallel and Distributed Computing, pp. 131–138. IEEE (2008)

    Google Scholar 

  15. Yu, J., Buyya, R., Ramamohanarao, K.: Workflow scheduling algorithms for grid computing. In: Metaheuristics for Scheduling in Distributed Computing Environments, pp. 173–214. Springer (2008)

    Google Scholar 

  16. Wieczorek, M., Prodan, R., Fahringer, T.: Scheduling of scientific workflows in the askalon grid environment. ACM SIGMOD Rec. 34(3), 56–62 (2005)

    Article  Google Scholar 

  17. Maheswaran, M., Ali, S., Siegal, H., Hensgen, D., Freund, R.F.: Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems. In: Heterogeneous Computing Workshop, 1999.(HCW’99) Proceedings. Eighth, pp. 30–44. IEEE (1999)

    Google Scholar 

  18. Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)

    Article  Google Scholar 

  19. Sakellariou, R., Zhao, H.: A hybrid heuristic for dag scheduling on heterogeneous systems. In: Proceedings of the 18th International Parallel and Distributed Processing Symposium, 2004, p. 111. IEEE (2004)

    Google Scholar 

  20. Bajaj, R., Agrawal, D.P.: Improving scheduling of tasks in a heterogeneous environment. IEEE Trans. Parallel Distrib. Syst. 15(2), 107–118 (2004)

    Article  Google Scholar 

  21. Golberg, D.E.: Genetic algorithms in search, optimization, and machine learning. Addion Wesley 1989, 102 (1989)

    Google Scholar 

  22. Hou, E.S., Ansari, N., Ren, H.: A genetic algorithm for multiprocessor scheduling. IEEE Trans. Parallel Distrib. Syst. 5(2), 113–120 (1994)

    Article  Google Scholar 

  23. YarKhan, A., Dongarra, J.J.: Experiments with scheduling using simulated annealing in a grid environment. In: International Workshop on Grid Computing, pp. 232–242. Springer (2002)

    Google Scholar 

  24. Menasce, D.A., Casalicchio, E.: A framework for resource allocation in grid computing. In: MASCOTS, pp. 259–267. Citeseer (2004)

    Google Scholar 

  25. Yu, J., Buyya, R., Tham, C.K.: Cost-based scheduling of scientific workflow applications on utility grids. In: First International Conference on e-Science and Grid Computing (e-Science’05), pp. 8–pp. IEEE (2005)

    Google Scholar 

  26. Sakellariou, R., Zhao, H., Tsiakkouri, E., Dikaiakos, M.D.: Scheduling workflows with budget constraints. In: Integrated Research in GRID Computing, pp. 189–202. Springer (2007)

    Google Scholar 

  27. Ramakrishnan, A., Singh, G., Zhao, H., Deelman, E., Sakellariou, R., Vahi, K., Blackburn, K., Meyers, D., Samidi, M.: Scheduling data-intensiveworkflows onto storage-constrained distributed resources. In: Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid’07), pp. 401–409. IEEE (2007)

    Google Scholar 

  28. Yu, Z., Shi, W.: A planner-guided scheduling strategy for multiple workflow applications. In: 2008 International Conference on Parallel Processing-Workshops, pp. 1–8. IEEE (2008)

    Google Scholar 

  29. Deelman, E., Singh, G., Su, M.H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., et al.: Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Sci. Prog. 13(3), 219–237 (2005)

    Google Scholar 

  30. Xu, M., Cui, L., Wang, H., Bi, Y.: A multiple qos constrained scheduling strategy of multiple workflows for cloud computing. In: 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, pp. 629–634. IEEE (2009)

    Google Scholar 

  31. Durillo, J.J., Nae, V., Prodan, R.: Multi-objective energy-efficient workflow scheduling using list-based heuristics. Future Gener. Compu. Syst. 36, 221–236 (2014)

    Article  Google Scholar 

  32. Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T., Glover, K., Pocock, M.R., Wipat, A., et al.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)

    Article  Google Scholar 

  33. Taylor, I., Shields, M., Wang, I., Rana, O.: Triana applications within grid computing and peer to peer environments. J. Grid Comput. 1(2), 199–217 (2003)

    Article  Google Scholar 

  34. Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludascher, B., Mock, S.: Kepler: an extensible system for design and execution of scientific workflows. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management, 2004, pp. 423–424. IEEE (2004)

    Google Scholar 

  35. Fahringer, T., Prodan, R., Duan, R., Nerieri, F., Podlipnig, S., Qin, J., Siddiqui, M., Truong, H.L., Villazon, A., Wieczorek, M.: Askalon: A grid application development and computing environment. In: Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing, pp. 122–131. IEEE Computer Society (2005)

    Google Scholar 

  36. von Laszewski, G., Hategan, M.: Java Cog Kit Karajan/Gridant Workflow Guide. Tech. rep, Technical Report, Argonne National Laboratory, Argonne, IL, USA (2005)

    Google Scholar 

Download references

Acknowledgments

The research presented in this paper is supported by projects: DataWay: Real-time Data Processing Platform for Smart Cities: Making sense of Big Data—PN-II-RU-TE-2014-4-2731; MobiWay: Mobility Beyond Individualism: an Integrated Platform for Intelligent Transportation Systems of Tomorrow—PN-II-PT-PCCA-2013-4-0321; CyberWater grant of the Romanian National Authority for Scientific Research, CNDI-UEFISCDI, project number 47/2012; clueFarm: Information system based on cloud services accessible through mobile devices, to increase product quality and business development farms—PN-II-PT-PCCA-2013-4-0870.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florin Pop .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this chapter

Cite this chapter

Nita, MC., Vasile, M., Pop, F., Cristea, V. (2016). Workflow Scheduling Techniques for Big Data Platforms. In: Pop, F., Kołodziej, J., Di Martino, B. (eds) Resource Management for Big Data Platforms. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-44881-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44881-7_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44880-0

  • Online ISBN: 978-3-319-44881-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics