Abstract
Data-intensive scientific workflow based on Hadoop needs huge data transfer and storage. Aiming at this problem, on the environment of an executing computer cluster which has limited computing resources, this paper adopts the way of data prefetching to hide the overhead caused by data search and transfer and reduce the delays of data access. Prefetching algorithm for data-intensive scientific workflow based on the consideration of available computing resources is proposed. Experimental results indicate that the algorithm consumes less response time and raises the efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Taylor, I., Deelman, E., Gannon, D., et al.: Workflow in E-science. Springer, Heidelberg (2007)
Dean, J., Ghemawat, S.: Map/Reduce: Simplified Data Processing on Large Clusters. In: OSDI 2004:Sixth Symposium on Operating System Design and Implementation (2004)
Bharathi, S.: Characterization of scientific workflows. In: Workflows in Support of Large-scale Science (s.n.), pp. 1–10 (2008)
Yu, J., Buyya, R.: A Taxonomy of workflow management systems for grid computing. Journal of Grid Computing 3(3-4), 171–200 (2005)
Tang, X., Hao, T.: Schedule algorithm for scientific workflow based on limited available storage. Computer Engineering 35, 71–73 (2009)
Shi, X., Stevens, R.: SWARM:A Scientific Workflow for Supporting Bayesian Approaches to Improve Metabolic Models. In: Proceedings of the 6th International Workshop on Challenges of Large Applications in Distributed Environments, pp. 25–34. ACM, New York (2008)
Nanopoulos, A., Katsaros, D., Manolopoulos, Y.: A data mining algorithm for generalized Web prefetching. IEEE Trans. on Knowledge and Data Engineering 15(5), 1155–1169 (2003)
Chen, J., Feng, D.: An intelligent prefetching strategy for a data grid prototype system. In: IEEE International Conference on Wireless Communications, Networking and Mobile Computing, Wuhan, China (2005)
Dominique, T.: Improving Disk Cache Hit-Ratios Through Cache Partitioning. IEEE Trans. on Computers 41(6), 665–676 (1992)
Seo, S., Jang, I., Woo, K., Kim, I., Kim, J.-S., Maeng, S.: Hpmr: Prefetching and pre-shuffling in shared mapreduce computation environment. In: IEEE CLUSTER (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Chen, G. et al. (2012). Data Prefetching for Scientific Workflow Based on Hadoop. In: Lee, R. (eds) Computer and Information Science 2012. Studies in Computational Intelligence, vol 429. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30454-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-30454-5_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30453-8
Online ISBN: 978-3-642-30454-5
eBook Packages: EngineeringEngineering (R0)