A Data Locality Aware Online Scheduling Approach for I/O-Intensive Jobs with File Sharing

Khanna, Gaurav; Catalyurek, Umit; Kurc, Tahsin; Sadayappan, P.; Saltz, Joel

doi:10.1007/978-3-540-71035-6_7

A Data Locality Aware Online Scheduling Approach for I/O-Intensive Jobs with File Sharing

Gaurav Khanna¹,
Umit Catalyurek²,
Tahsin Kurc²,
P. Sadayappan¹ &
…
Joel Saltz²

Conference paper

385 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4376))

Abstract

Many scientific investigations have to deal with large amounts of data from simulations and experiments. Data analysis in such investigations typically involves extraction of subsets of data, followed by computations performed on extracted data. Scheduling in this context requires efficient utilization of the computational, storage and network resources to optimize response time. The data-intensive nature of such applications necessitates data-locality aware job scheduling algorithms. This paper proposes a hypergraph based dynamic scheduling heuristic for a stream of independent I/O intensive jobs with file sharing behavior. The proposed heuristic is based on an event-driven, run-time hypergraph modeling of the file sharing characteristics among jobs. Our experiments on a coupled compute/storage cluster show it performs better compared to previously proposed strategies, under a varying set of parameters for workloads from the application domain of biomedical image analysis.

This research was supported in part by the National Science Foundation under Grants #CCF-0342615 and #CNS-0403342.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andrade, H., et al.: Scheduling multiple data visualization query workloads on a shared memory machine. In: Proceedings of the 2002 IEEE International Parallel and Distributed Processing Symposium (IPDPS 2002), Fort Lauderdale, FL, April 2002, IEEE Computer Society Press, Los Alamitos (2002)
Google Scholar
Casanova, H., et al.: The AppLeS parameter sweep template: User-level middleware for the grid. In: Proceedings of the 2000 ACM/IEEE SC00 Conference, pp. 75–76. IEEE Computer Society Press, Los Alamitos (2000)
Google Scholar
Casanova, H., et al.: Heuristics for scheduling parameter sweep applications in grid environments. In: Proceedings of the 9th Heterogeneous Computing Workshop (HCW’00), pp. 349–363 (2000)
Google Scholar
Çatalyürek, U.V., Aykanat, C.: Hypergraph-partitioning based decomposition for parallel sparse-matrix vector multiplication. IEEE Transactions on Parallel and Distributed Systems 10(7), 673–693 (1999)
Article Google Scholar
Jain, R., et al.: Heuristics for scheduling I/O operations. IEEE Transactions on Parallel and Distributed Systems 8(3), 310–320 (1997)
Article Google Scholar
Kavas, A., Er-El, D., Feitelson, D.G.: Using multicast to pre-load jobs on the parpar cluster. Parallel Computing 27(3), 315–327 (2001)
Article MATH Google Scholar
Khanna, G., et al.: A hypergraph partitioning based approach for scheduling of tasks with batch-shared I/O. In: Proceedings of the 5th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2005), May 2005, ACM Press, New York (2005)
Google Scholar
Kotz, D.: Disk-directed i/o for mimd multiprocessors. ACM Transactions on Computer Systems 15(1), 41–74 (1997)
Article MathSciNet Google Scholar
Lublin, U., Feitelson, D.G.: The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J. Parallel Distrib. Comput. 63(11), 1105–1122 (2003)
Article MATH Google Scholar
Maheswaran, M., et al.: Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems. In: Heterogeneous Computing Workshop (HCW’99), Apr. 1999, pp. 30–44 (1999)
Google Scholar
Mehta, M., Soloviev, V., DeWitt, D.J.: Batch scheduling in parallel database systems. In: Proceedings of the 9th International Conference on Data Engineering (ICDE 1993), Vienna, Austria (1993)
Google Scholar
Mohamed, H., Epema, D.: An evaluation of the close-to-files processor and data co-allocation policy in multiclusters. In: 2004 IEEE International Conference on Cluster Computing, pp. 287–298. IEEE Computer Society Press, Los Alamitos (2004)
Google Scholar
Ranganathan, K., Foster, I.: Decoupling computation and data scheduling in distributed data-intensive applications. In: Proceedings of the Eleventh IEEE Symposium on High Performance Distributed Computing (HPDC), Edinburgh, Scotland, July 2002, IEEE, Los Alamitos (2002)
Google Scholar
Thain, D., et al.: Pipeline and batch sharing in grid workloads. In: Proceedings of High-Performance Distributed Computing (HPDC-12), Seattle, Washington, June 2003, pp. 152–161 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science and Engineering,
Gaurav Khanna & P. Sadayappan
Dept. of Biomedical Informatics, The Ohio State University,
Umit Catalyurek, Tahsin Kurc & Joel Saltz

Authors

Gaurav Khanna
View author publications
You can also search for this author in PubMed Google Scholar
Umit Catalyurek
View author publications
You can also search for this author in PubMed Google Scholar
Tahsin Kurc
View author publications
You can also search for this author in PubMed Google Scholar
P. Sadayappan
View author publications
You can also search for this author in PubMed Google Scholar
Joel Saltz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Eitan Frachtenberg Uwe Schwiegelshohn

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khanna, G., Catalyurek, U., Kurc, T., Sadayappan, P., Saltz, J. (2007). A Data Locality Aware Online Scheduling Approach for I/O-Intensive Jobs with File Sharing. In: Frachtenberg, E., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2006. Lecture Notes in Computer Science, vol 4376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71035-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-71035-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71034-9
Online ISBN: 978-3-540-71035-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics