Skip to main content

Workload Balancing and Throughput Optimization for Heterogeneous Systems Subject to Failures

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 6852)


In this paper, we study the problem of optimizing the throughput of streaming applications for heterogeneous platforms subject to failures. The applications are linear graphs of tasks (pipelines), and a type is associated to each task. The challenge is to map tasks onto the machines of a target platform, but machines must be specialized to process only one task type, in order to avoid costly context or setup changes. The objective is to maximize the throughput, i.e., the rate at which jobs can be processed when accounting for failures. For identical machines, we prove that an optimal solution can be computed in polynomial time. However, the problem becomes NP-hard when two machines can compute the same task type at different speeds. Several polynomial time heuristics are designed, and simulation results demonstrate their efficiency.


  • Failure Rate
  • Polynomial Time
  • Greedy Algorithm
  • Integer Linear Program
  • Task Type

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Bahi, J., Contassot-Vivier, S., Couturier, R.: Coupling dynamic load balancing with asynchronism in iterative algorithms on the computational grid. In: International Parallel and Distributed Processing Symposium, IPDPS 2003 (April 2003)

    Google Scholar 

  2. Benoit, A., Dobrila, A., Nicod, J.M., Philippe, L.: Workload balancing and throughput optimization for heterogeneous systems subject to failures. Research report, INRIA, France (February 2011),

  3. Blaz̊ewicz, J., Drabowski, M., Weglarz, J.: Scheduling multiprocessor tasks to minimize schedule length. IEEE Trans. Comput. 35, 389–393 (1986)

    Google Scholar 

  4. Cirne, W., Brasileiro, F., Paranhos, D., Góes, L.F.W., Voorsluys, W.: On the efficacy, efficiency and emergent behavior of task replication in large distributed systems. Parallel Computing 33(3), 213–234 (2007)

    CrossRef  Google Scholar 

  5. Descourvières, E., Debricon, S., Gendreau, D., Lutz, P., Philippe, L., Bouquet, F.: Towards automatic control for microfactories. In: IAIA 2007, 5th Int. Conf. on Industrial Automation (2007)

    Google Scholar 

  6. Garey, M.R., Johnson, D.S.: Computers and Intractability, a Guide to the Theory of NP-Completeness. W.H. Freeman and Company, New York (1979)

    MATH  Google Scholar 

  7. Gröflin, H., Klinkert, A., Dinh, N.P.: Feasible job insertions in the multi-processor-task job shop. European J. of Operational Research 185(3), 1308–1318 (2008)

    MathSciNet  CrossRef  MATH  Google Scholar 

  8. Jalote, P.: Fault Tolerance in Distributed Systems. Prentice-Hall, Englewood Cliffs (1994)

    Google Scholar 

  9. Litke, A., Skoutas, D., Tserpes, K., Varvarigou, T.: Efficient task replication and management for adaptive fault tolerance in mobile grid environments. Future Generation Computer Systems 23(2), 163–178 (2007)

    CrossRef  Google Scholar 

  10. Parhami, B.: Voting algorithms. IEEE Trans. on Reliability 43(4), 617–629 (1994)

    MathSciNet  CrossRef  Google Scholar 

  11. Schrijver, A.: Combinatorial Optimization: Polyhedra and Efficiency. Algorithms and Combinatorics, vol. 24. Springer, Heidelberg (2003)

    MATH  Google Scholar 

  12. Tanaka, M.: Development of desktop machining microfactory. Journal RIKEN Rev 34, 46–49 (2001) iSSN:0919-3405

    Google Scholar 

  13. Verettas, I., Clavel, R., Codourey, A.: Pocketfactory: a modular and miniature assembly chain including a clean environment. In: 5th Int. Workshop on Microfactories (2006)

    Google Scholar 

  14. Weissman, J.B., Womack, D.: Fault tolerant scheduling in distributed networks (1996)

    Google Scholar 

  15. West, R., Zhang, Y., Schwan, K., Poellabauer, C.: Dynamic window-constrained scheduling of real-time streams in media servers (2004)

    Google Scholar 

  16. West, R., Poellabauer, C.: Analysis of a window-constrained scheduler for real-time and best-effort packet streams. In: Proc. of the 21st IEEE Real-Time Systems Symp., pp. 239–248. IEEE, Los Alamitos (2000)

    CrossRef  Google Scholar 

  17. West, R., Schwan, K.: Dynamic Window-Constrained Scheduling for Multimedia Applications. In: ICMCS, vol. 2, pp. 87–91 (1999)

    Google Scholar 

  18. Wieczorek, M., Hoheisel, A., Prodan, R.: Towards a general model of the multi-criteria workflow scheduling on the grid. Future Gener. Comput. Syst. 25(3), 237–256 (2009)

    CrossRef  Google Scholar 

  19. Yu, J., Buyya, R.: A taxonomy of workflow management systems for grid computing. Research Report GRIDS-TR-2005-1, Grid Computing and Distributed Systems Laboratory, University of Melbourne, Australia (April 2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Benoit, A., Dobrila, A., Nicod, JM., Philippe, L. (2011). Workload Balancing and Throughput Optimization for Heterogeneous Systems Subject to Failures. In: Jeannot, E., Namyst, R., Roman, J. (eds) Euro-Par 2011 Parallel Processing. Euro-Par 2011. Lecture Notes in Computer Science, vol 6852. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23399-9

  • Online ISBN: 978-3-642-23400-2

  • eBook Packages: Computer ScienceComputer Science (R0)