Skip to main content

Data Prefetching for Scientific Workflow Based on Hadoop

  • Chapter
Computer and Information Science 2012

Part of the book series: Studies in Computational Intelligence ((SCI,volume 429))

Abstract

Data-intensive scientific workflow based on Hadoop needs huge data transfer and storage. Aiming at this problem, on the environment of an executing computer cluster which has limited computing resources, this paper adopts the way of data prefetching to hide the overhead caused by data search and transfer and reduce the delays of data access. Prefetching algorithm for data-intensive scientific workflow based on the consideration of available computing resources is proposed. Experimental results indicate that the algorithm consumes less response time and raises the efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Taylor, I., Deelman, E., Gannon, D., et al.: Workflow in E-science. Springer, Heidelberg (2007)

    Book  Google Scholar 

  2. Dean, J., Ghemawat, S.: Map/Reduce: Simplified Data Processing on Large Clusters. In: OSDI 2004:Sixth Symposium on Operating System Design and Implementation (2004)

    Google Scholar 

  3. http://hadoop.apache.org/

  4. Bharathi, S.: Characterization of scientific workflows. In: Workflows in Support of Large-scale Science (s.n.), pp. 1–10 (2008)

    Google Scholar 

  5. Yu, J., Buyya, R.: A Taxonomy of workflow management systems for grid computing. Journal of Grid Computing 3(3-4), 171–200 (2005)

    Article  Google Scholar 

  6. Tang, X., Hao, T.: Schedule algorithm for scientific workflow based on limited available storage. Computer Engineering 35, 71–73 (2009)

    Google Scholar 

  7. Shi, X., Stevens, R.: SWARM:A Scientific Workflow for Supporting Bayesian Approaches to Improve Metabolic Models. In: Proceedings of the 6th International Workshop on Challenges of Large Applications in Distributed Environments, pp. 25–34. ACM, New York (2008)

    Google Scholar 

  8. Nanopoulos, A., Katsaros, D., Manolopoulos, Y.: A data mining algorithm for generalized Web prefetching. IEEE Trans. on Knowledge and Data Engineering 15(5), 1155–1169 (2003)

    Article  Google Scholar 

  9. Chen, J., Feng, D.: An intelligent prefetching strategy for a data grid prototype system. In: IEEE International Conference on Wireless Communications, Networking and Mobile Computing, Wuhan, China (2005)

    Google Scholar 

  10. Dominique, T.: Improving Disk Cache Hit-Ratios Through Cache Partitioning. IEEE Trans. on Computers 41(6), 665–676 (1992)

    Article  Google Scholar 

  11. Seo, S., Jang, I., Woo, K., Kim, I., Kim, J.-S., Maeng, S.: Hpmr: Prefetching and pre-shuffling in shared mapreduce computation environment. In: IEEE CLUSTER (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gaozhao Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Chen, G. et al. (2012). Data Prefetching for Scientific Workflow Based on Hadoop. In: Lee, R. (eds) Computer and Information Science 2012. Studies in Computational Intelligence, vol 429. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30454-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30454-5_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30453-8

  • Online ISBN: 978-3-642-30454-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics