Data Prefetching for Scientific Workflow Based on Hadoop

Chen, Gaozhao; Wu, Shaochun; Gu, Rongrong; Xu, Yongquan; Xu, Lingyu; Ge, Yunwen; Song, Cuicui

doi:10.1007/978-3-642-30454-5_6

Gaozhao Chen²,
Shaochun Wu²,
Rongrong Gu²,
Yongquan Xu²,
Lingyu Xu²,
Yunwen Ge² &
…
Cuicui Song²

Part of the book series: Studies in Computational Intelligence ((SCI,volume 429))

848 Accesses
3 Citations

Abstract

Data-intensive scientific workflow based on Hadoop needs huge data transfer and storage. Aiming at this problem, on the environment of an executing computer cluster which has limited computing resources, this paper adopts the way of data prefetching to hide the overhead caused by data search and transfer and reduce the delays of data access. Prefetching algorithm for data-intensive scientific workflow based on the consideration of available computing resources is proposed. Experimental results indicate that the algorithm consumes less response time and raises the efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Taylor, I., Deelman, E., Gannon, D., et al.: Workflow in E-science. Springer, Heidelberg (2007)
Book Google Scholar
Dean, J., Ghemawat, S.: Map/Reduce: Simplified Data Processing on Large Clusters. In: OSDI 2004:Sixth Symposium on Operating System Design and Implementation (2004)
Google Scholar
http://hadoop.apache.org/
Bharathi, S.: Characterization of scientific workflows. In: Workflows in Support of Large-scale Science (s.n.), pp. 1–10 (2008)
Google Scholar
Yu, J., Buyya, R.: A Taxonomy of workflow management systems for grid computing. Journal of Grid Computing 3(3-4), 171–200 (2005)
Article Google Scholar
Tang, X., Hao, T.: Schedule algorithm for scientific workflow based on limited available storage. Computer Engineering 35, 71–73 (2009)
Google Scholar
Shi, X., Stevens, R.: SWARM:A Scientific Workflow for Supporting Bayesian Approaches to Improve Metabolic Models. In: Proceedings of the 6th International Workshop on Challenges of Large Applications in Distributed Environments, pp. 25–34. ACM, New York (2008)
Google Scholar
Nanopoulos, A., Katsaros, D., Manolopoulos, Y.: A data mining algorithm for generalized Web prefetching. IEEE Trans. on Knowledge and Data Engineering 15(5), 1155–1169 (2003)
Article Google Scholar
Chen, J., Feng, D.: An intelligent prefetching strategy for a data grid prototype system. In: IEEE International Conference on Wireless Communications, Networking and Mobile Computing, Wuhan, China (2005)
Google Scholar
Dominique, T.: Improving Disk Cache Hit-Ratios Through Cache Partitioning. IEEE Trans. on Computers 41(6), 665–676 (1992)
Article Google Scholar
Seo, S., Jang, I., Woo, K., Kim, I., Kim, J.-S., Maeng, S.: Hpmr: Prefetching and pre-shuffling in shared mapreduce computation environment. In: IEEE CLUSTER (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Engineering and Science, Shanghai University, Shanghai, 200072, P.R. China
Gaozhao Chen, Shaochun Wu, Rongrong Gu, Yongquan Xu, Lingyu Xu, Yunwen Ge & Cuicui Song

Authors

Gaozhao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shaochun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Rongrong Gu
View author publications
You can also search for this author in PubMed Google Scholar
Yongquan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Lingyu Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yunwen Ge
View author publications
You can also search for this author in PubMed Google Scholar
Cuicui Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gaozhao Chen .

Editor information

Editors and Affiliations

Software Engineering & Information, Technology Institute, Central Michigan University, Mt. Pleasant, 48859, Michigan, USA
Roger Lee

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chen, G. et al. (2012). Data Prefetching for Scientific Workflow Based on Hadoop. In: Lee, R. (eds) Computer and Information Science 2012. Studies in Computational Intelligence, vol 429. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30454-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-30454-5_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30453-8
Online ISBN: 978-3-642-30454-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics