Abstract
To gain higher performance under many constraints, effective scheduling is a key concern in data-intensive grid computing. Based on a Dual-Component and Dual-Queue Distributed Schedule Model (DCDQDSM), we present task and data co-scheduling algorithms, by which the waiting time to access datasets for the scheduled task will reduce. Firstly data replication and elimination schedule are processed by an independent approach. Secondly, if a task is divisible, the task and its dataset are divided into subtasks and their necessary data subsets. Task scheduling adopts a general approach. Finally, when a scheduled task/subtask doesn’t hit its dataset, associated data transferring is bound to this task. On the basis of relation between task execution and data access, data replication and computing may proceed concurrently in one scheduled task with divisible dataset or between scheduled tasks. Corresponding theoretic analysis and experimental results suggest that the scheduling algorithms improve execution performance and resource utilization.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Foster, I., Kesselman, C., et al.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. Journal of High Performance Computing Applications 15(3), 200–222 (2001)
Allcock, W., Chervenak, A., Foster, I., et al.: The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets. Journal of Network and Computer Applications 23(3), 187–200 (2000)
Beaumont, O., et al.: Bandwidth-Centric Allocation of Independent Tasks on Heterogeneous Platforms. In: Proc. of the International Parallel and Distributed Processing Symposium (2002)
Vadhiyar, S.S., Dongarra, J.J.: A Metascheduler for the Grid. In: Proc. of the 11th IEEE International Symposium on High Performance Distributed Computing (2002)
Berman, F., et al.: Adaptive Computing on the Grid Using AppLeS. IEEE Transactions on Parallel and Distribted Systems 14(4), 369–382 (2003)
Wolski, R., et al.: The Network Weather Service: a Distributed Resource Performance Forecasting Service for Metacomputing. Future Generation Computing Systems (5-6), 757–768 (1999)
Smith, W., et al.: Predicting Application Run Times Using Historical Information. In: Proc. of the IPPS/SPDP Workshop on Job Scheduling Strategies for Parallel Processing (1998)
Zomaya, Y., et al.: Observations on Using Genetic Algorithms for Dynamic Load- Balancing. IEEE Transactions on Parallel and Distributed Systems 9, 899–911 (2001)
Ranganathan, K., Foster, I.: Identifying Dynamic Replication Strategies for a High- Performance Data Grid. In: Proc. of the 2nd IEEE/ACM International Workshop on Grid Computing–GRID 2001 (2001)
Blazewicz, J., et al.: Divisible Task Scheduling - Concept and Verification. Journal of Parallel Computing 25(1), 87–98 (1999)
Yang, Y., et al.: RUMR: Robust Scheduling for Divisible Workloads. In: Proc. of the 12th IEEE International Symposium on High Performance Distributed Computing (2003)
Beaumont, O., Legrand, A., et al.: Scheduling Strategies for Mixed Data and Task Parallelism on Heterogeneous Clusters and Grids. In: Proc. of the 11th Euromicro Conference on Parallel, Distributed and Network-Based Processing (2003)
Balaji, P., Wu, J., Kurc, T.: Impact of High Performance Sockets on Data Intensive Applications. In: Proc. of the 12th IEEE International Symposium on High Performance Distributed Computing (2003)
Thain, D., Bent, J., et al.: Gathering at the Well: Creating Communities for Grid I/O. In: Proc. of Supercomputing 2000, Denver (2000)
Ranganathan, K., Foster, I.: Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications. In: Proc. of the 11th International Symposium on High Performance Distributed Computing (2002)
Nudd, G.R., Kerbyson, D.J., et al.: PACE – A Toolset for the Performance Prediction of Parallel and Distributed Systems. Journal of High Performance Computing Applications 3, 228–251 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huang, C., Chen, D., Zheng, Y., Hu, H. (2004). Performance-Driven Task and Data Co-scheduling Algorithms for Data-Intensive Applications in Grid Computing. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds) Advanced Web Technologies and Applications. APWeb 2004. Lecture Notes in Computer Science, vol 3007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24655-8_36
Download citation
DOI: https://doi.org/10.1007/978-3-540-24655-8_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21371-0
Online ISBN: 978-3-540-24655-8
eBook Packages: Springer Book Archive