Minimizing the stretch when scheduling flows of divisible requests
In this paper, we consider the problem of scheduling distributed biological sequence comparison applications. This problem lies in the divisible load framework with negligible communication costs. Thus far, very few results have been proposed for this model. We discuss and select relevant metrics for this framework: namely max-stretch and sum-stretch. We explain the relationship between our model and the preemptive single processor case, and we show how to extend algorithms that have been proposed in the literature for the single processor model to the divisible multi-processor problem domain. We recall known results on closely related problems, we show how to minimize the max-stretch on unrelated machines either in the divisible load model or with preemption, we derive new lower bounds on the competitive ratio of any online algorithm, we present new competitiveness results for existing algorithms, and we develop several new online heuristics. We also address the Pareto optimization of max-stretch. Then, we extensively study the performance of these algorithms and heuristics under realistic scenarios. Our study shows that all previously proposed guaranteed heuristics for max-stretch for the single processor model are inefficient in practice. In contrast, we show that our online algorithms based on linear programming are in practice near-optimal solutions for max-stretch. Our study also clearly suggests heuristics that are efficient for both metrics, although a combined optimization is in theory not possible in the general case.
KeywordsBioinformatics Heterogeneous computing Scheduling Divisible load Linear programming Stretch
Unable to display preview. Download preview PDF.
- Baker, K. R. (1974). Introduction to sequencing and scheduling. New York: Wiley. Google Scholar
- Bender, M. A. (1998). New algorithms and metrics for scheduling. Ph.D. thesis, Harvard University, May 1998. Google Scholar
- Bender, M. A., Chakrabarti, S., & Muthukrishnan, S. (1998). Flow and stretch metrics for scheduling continuous job streams. In Proceedings of the 9th annual ACM-SIAM symposium on discrete algorithms (SODA’98) (pp. 270–279). Philadelphia: SIAM. Google Scholar
- Bender, M. A., Muthukrishnan, S., & Rajaraman, R. (2002). Improved algorithms for stretch scheduling. In SODA’02: Proceedings of the thirteenth annual ACM-SIAM symposium on discrete algorithms (pp. 762–771). Philadelphia, PA, USA, 2002. Philadelphia: SIAM. Google Scholar
- Bertsekas, D., & Gallager, R. (1987). Data networks. Englewood Cliffs: Prentice Hall. Google Scholar
- Bharadwaj, V., Ghose, D., Mani, V., & Robertazzi, T. G. (1996). Scheduling divisible loads in parallel and distributed systems. Los Alamitos: IEEE Comput. Soc. Google Scholar
- Blazewicz, J. (1977). Scheduling dependent tasks with different arrival times to meet deadlines. In H. Beilner & E. Gelenbe (Eds.), Modelling and performance evaluation of computer systems (Proceedings of the international workshop) (pp. 57–65). Amsterdam: North-Holland. Google Scholar
- Blazewicz, J., Ecker, K. H., Pesch, E., Schmidt, G., & Weglarz, J. (2007). Handbook on scheduling: from theory to applications. International handbooks on information systems. Berlin: Springer. ISBN: 978-3-540-28046-0. Google Scholar
- Chekuri, C., & Khanna, S. (2002). Approximation schemes for preemptive weighted flow time. In Proceedings of the 34th annual ACM symposium on theory of computing (pp. 297–305). New York: Assoc. Comput. Mach. Google Scholar
- Darling, A. E., Carey, L., & Feng, W. Ch. (2003). The design, implementation, and evaluation of mpiBLAST. In Proceedings of ClusterWorld 2003. Google Scholar
- Dertouzos, M. L. (1974). Control robotics: the procedural control of physical processes. In Proceedings of IFIP congress (pp. 897–813). Google Scholar
- Garey, M. R., & Johnson, D. S. (1991). Computers and intractability, a guide to the theory of NP-completeness. New York: Freeman. Google Scholar
- Gonzalez, T., & Sahni, S. (1976). Open shop scheduling to minimize finish time. Journal of the Association for Computing Machinery, 23(4), 665–679. Google Scholar
- GriPPS webpage at http://gripps.ibcp.fr/ (2005).
- Labetoulle, J., Lawler, E. L., Lenstra, J. K., & Rinnooy Kan, A. H. G. (1984). Preemptive scheduling of uniform machines subject to release dates. In W. R. Pulleyblank (Ed.), Progress in combinatorial optimization (pp. 245–261). San Diego: Academic Press. Google Scholar
- Lawler, E. L., & Labetoulle, J. (1978). On preemptive scheduling of unrelated parallel processors by linear programming. Journal of the Association for Computing Machinery, 25(4), 612–619. Google Scholar
- Legrand, A., Marchal, L., & Casanova, H. (2003). Scheduling distributed applications: the SimGrid simulation framework. In Proceedings of the 3rd IEEE symposium on cluster computing and the grid. Google Scholar
- Legrand, A., Su, A., & Vivien, F. (2004). Off-line scheduling of divisible requests on an heterogeneous collection of databanks (Research Report 5386). INRIA, November 2004. Also available as LIP, ENS Lyon, Research Report 2004-51. Google Scholar
- Legrand, A., Su, A., & Vivien, F. (2005). Off-line scheduling of divisible requests on an heterogeneous collection of databanks. In Proceedings of the 14th heterogeneous computing workshop, Denver, Colorado, USA, April 2005. Los Alamitos: IEEE Comput. Soc. Google Scholar
- Legrand, A., Su, A., & Vivien, F. (2006). Minimizing the stretch when scheduling flows of biological requests. In Symposium on parallelism in algorithms and architectures SPAA’2006. New York: Assoc. Comput. Mach. Google Scholar
- Legrand, A., Su, A., & Vivien, F. (2008). Minimizing the stretch when scheduling flows of divisible requests (Research Report RR2008-08). LIP, École Normale Supérieure de Lyon, February 2008. This is a revised version of the LIP Research Report RR2006-19. Also available as INRIA Research Report 6002 http://hal.inria.fr/inria-00108524.
- Megow, N. (2002). Performance analysis of on-line algorithms in machine scheduling. Diplomarbeit, Technische Universität Berlin, April 2002. Google Scholar
- Miller, P. L., Nadkarni, P. M., & Carriero, N. M. (1991). Parallel computation and FASTA: confronting the problem of parallel database search for a fast sequence comparison algorithm. Computer Applications in the Biosciences, 7(1), 71–78. Google Scholar
- Muthukrishnan, S., Rajaraman, R., Shaheen, A., & Gehrke, J. (1999). Online scheduling to minimize average stretch. In IEEE symposium on foundations of computer science (pp. 433–442). Google Scholar