Skip to main content

A Data Locality Aware Online Scheduling Approach for I/O-Intensive Jobs with File Sharing

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4376))

Abstract

Many scientific investigations have to deal with large amounts of data from simulations and experiments. Data analysis in such investigations typically involves extraction of subsets of data, followed by computations performed on extracted data. Scheduling in this context requires efficient utilization of the computational, storage and network resources to optimize response time. The data-intensive nature of such applications necessitates data-locality aware job scheduling algorithms. This paper proposes a hypergraph based dynamic scheduling heuristic for a stream of independent I/O intensive jobs with file sharing behavior. The proposed heuristic is based on an event-driven, run-time hypergraph modeling of the file sharing characteristics among jobs. Our experiments on a coupled compute/storage cluster show it performs better compared to previously proposed strategies, under a varying set of parameters for workloads from the application domain of biomedical image analysis.

This research was supported in part by the National Science Foundation under Grants #CCF-0342615 and #CNS-0403342.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andrade, H., et al.: Scheduling multiple data visualization query workloads on a shared memory machine. In: Proceedings of the 2002 IEEE International Parallel and Distributed Processing Symposium (IPDPS 2002), Fort Lauderdale, FL, April 2002, IEEE Computer Society Press, Los Alamitos (2002)

    Google Scholar 

  2. Casanova, H., et al.: The AppLeS parameter sweep template: User-level middleware for the grid. In: Proceedings of the 2000 ACM/IEEE SC00 Conference, pp. 75–76. IEEE Computer Society Press, Los Alamitos (2000)

    Google Scholar 

  3. Casanova, H., et al.: Heuristics for scheduling parameter sweep applications in grid environments. In: Proceedings of the 9th Heterogeneous Computing Workshop (HCW’00), pp. 349–363 (2000)

    Google Scholar 

  4. Çatalyürek, U.V., Aykanat, C.: Hypergraph-partitioning based decomposition for parallel sparse-matrix vector multiplication. IEEE Transactions on Parallel and Distributed Systems 10(7), 673–693 (1999)

    Article  Google Scholar 

  5. Jain, R., et al.: Heuristics for scheduling I/O operations. IEEE Transactions on Parallel and Distributed Systems 8(3), 310–320 (1997)

    Article  Google Scholar 

  6. Kavas, A., Er-El, D., Feitelson, D.G.: Using multicast to pre-load jobs on the parpar cluster. Parallel Computing 27(3), 315–327 (2001)

    Article  MATH  Google Scholar 

  7. Khanna, G., et al.: A hypergraph partitioning based approach for scheduling of tasks with batch-shared I/O. In: Proceedings of the 5th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2005), May 2005, ACM Press, New York (2005)

    Google Scholar 

  8. Kotz, D.: Disk-directed i/o for mimd multiprocessors. ACM Transactions on Computer Systems 15(1), 41–74 (1997)

    Article  MathSciNet  Google Scholar 

  9. Lublin, U., Feitelson, D.G.: The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J. Parallel Distrib. Comput. 63(11), 1105–1122 (2003)

    Article  MATH  Google Scholar 

  10. Maheswaran, M., et al.: Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems. In: Heterogeneous Computing Workshop (HCW’99), Apr. 1999, pp. 30–44 (1999)

    Google Scholar 

  11. Mehta, M., Soloviev, V., DeWitt, D.J.: Batch scheduling in parallel database systems. In: Proceedings of the 9th International Conference on Data Engineering (ICDE 1993), Vienna, Austria (1993)

    Google Scholar 

  12. Mohamed, H., Epema, D.: An evaluation of the close-to-files processor and data co-allocation policy in multiclusters. In: 2004 IEEE International Conference on Cluster Computing, pp. 287–298. IEEE Computer Society Press, Los Alamitos (2004)

    Google Scholar 

  13. Ranganathan, K., Foster, I.: Decoupling computation and data scheduling in distributed data-intensive applications. In: Proceedings of the Eleventh IEEE Symposium on High Performance Distributed Computing (HPDC), Edinburgh, Scotland, July 2002, IEEE, Los Alamitos (2002)

    Google Scholar 

  14. Thain, D., et al.: Pipeline and batch sharing in grid workloads. In: Proceedings of High-Performance Distributed Computing (HPDC-12), Seattle, Washington, June 2003, pp. 152–161 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Eitan Frachtenberg Uwe Schwiegelshohn

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Khanna, G., Catalyurek, U., Kurc, T., Sadayappan, P., Saltz, J. (2007). A Data Locality Aware Online Scheduling Approach for I/O-Intensive Jobs with File Sharing. In: Frachtenberg, E., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2006. Lecture Notes in Computer Science, vol 4376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71035-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71035-6_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71034-9

  • Online ISBN: 978-3-540-71035-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics