Skip to main content

A Data-Aware Scheduling Framework for Parallel Applications in a Cloud Environment

  • Conference paper
  • First Online:
Emerging Trends in Computing and Communication

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 298))

Abstract

Cloud infrastructures are competent to providing massive processing capabilities of computational and data resources in virtualized environments. Introduction of big data analytics in many spheres of science, technology and business has led to the trend of employing data-parallel frameworks, like Hadoop for handling such massive data requirements. Since most Hadoop based systems make the two decisions of scheduling data and computation independently, it seems a promising prospective to map computations within cloud resources based on data blocks already distributed to them. This paper proposes a computation scheduling framework that adopts the strategy of improving computation and data co-allocation within a Hadoop cloud infrastructure based on knowledge of data blocks availability, hereafter referred to as Data Aware Scheduling (DAS) framework. The proposed DAS employs a dependency based grouping of data. Experiments have been conducted using standard map-reduce applications and results presented herein conclusively demonstrate the efficacy of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yuan D, Yang Y, Liu X, Chen J (2010) A data placement strategy in scientific cloud workflows. Future Gener Comput Syst 26:1200–1214

    Article  Google Scholar 

  2. www.gridgain.com. Last accessed 5 Dec 2013

  3. http://hadoop.apache.org/. Accessed 15 Nov 2013

  4. http://genome.ucsc.edu/

  5. http://bowtiebio.sourceforge.net/index.shtml

Download references

Acknowledgments

This work has been carried out at the Data Sciences Lab, Department of Computer Science and Engineering, National Institute of Science and Technology, Berhampur.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to B. Jaykishan , K. Hemant Kumar Reddy or Diptendu Sinha Roy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer India

About this paper

Cite this paper

Jaykishan, B., Hemant Kumar Reddy, K., Roy, D.S. (2014). A Data-Aware Scheduling Framework for Parallel Applications in a Cloud Environment. In: Sengupta, S., Das, K., Khan, G. (eds) Emerging Trends in Computing and Communication. Lecture Notes in Electrical Engineering, vol 298. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1817-3_49

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-1817-3_49

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-1816-6

  • Online ISBN: 978-81-322-1817-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics