The Divisible Load Balance Problem and Its Application to Phylogenetic Inference

  • Kassian Kobert
  • Tomáš Flouri
  • Andre Aberer
  • Alexandros Stamatakis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8701)


Motivated by load balance issues in parallel calculations of the phylogenetic likelihood function we address the problem of distributing divisible items to a given number of bins. The task is to balance the overall sum of (fractional) item sizes per bin, while keeping the maximum number of unique elements in any bin to a minimum. We show that this problem is NP-hard and give a polynomial time approximation algorithm that yields a solution where the sums of (possibly fractional) item sizes are balanced across bins. Moreover, the maximum number of unique elements in the bins is guaranteed to exceed the optimal solution by at most one element. We implement the algorithm in two production-level parallel codes for large-scale likelihood-based phylogenetic inference: ExaML and ExaBayes. For ExaML, we observe best-case runtime improvements of up to a factor of 5.9 compared to the previously implemented data distribution algorithms.


Item Size Polynomial Time Approximation Algorithm Divisible Load Alignment Site Startup Latency 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bharadwaj, V., Ghose, D., Robertazzi, T.: Divisible load theory: A new paradigm for load scheduling in distributed systems. Cluster Computing 6(1), 7–17 (2003)CrossRefGoogle Scholar
  2. 2.
    Błażewicz, J., Drozdowski, M.: Distributed processing of divisible jobs with communication startup costs. Discrete Appl. Math. 76(1-3), 21–41 (1997)CrossRefzbMATHMathSciNetGoogle Scholar
  3. 3.
    Cook, S.A.: The complexity of theorem-proving procedures. In: STOC 1971 Proceedings of the Third Annual ACM Symposium on Theory of Computing, pp. 151–158 (1971)Google Scholar
  4. 4.
    Felsenstein, J.: Inferring phylogenies. Sinauer Associates (2003)Google Scholar
  5. 5.
    Gonzalez, T.F.: Handbook of Approximation Algorithms and Metaheuristics. Chapman & Hall/CRC (2007)Google Scholar
  6. 6.
    Karp, R.: Reducibility among combinatorial problems. Complexity of Computer Computations, 85–103 (1972)Google Scholar
  7. 7.
    Stamatakis, A., Aberer, A.J.: Novel parallelization schemes for large-scale likelihood-based phylogenetic inference. In: IPDPS, pp. 1195–1204 (2013)Google Scholar
  8. 8.
    Veeravalli, B., Li, X., Ko, C.C.: On the influence of start-up costs in scheduling divisible loads on bus networks. IEEE Transactions on Parallel and Distributed Systems 11(12), 1288–1305 (2000)CrossRefGoogle Scholar
  9. 9.
    Yang, Z.: Computational Molecular Evolution. Oxford University Press (2006)Google Scholar
  10. 10.
    Zhang, J., Stamatakis, A.: The multi-processor scheduling problem in phylogenetics. In: IPDPS Workshops, pp. 691–698. IEEE Computer Society (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Kassian Kobert
    • 1
  • Tomáš Flouri
    • 1
  • Andre Aberer
    • 1
  • Alexandros Stamatakis
    • 1
    • 2
  1. 1.Heidelberg Institute for Theoretical StudiesGermany
  2. 2.Institute for Theoretical InformaticsKarlsruhe Institute of TechnologyKarlsruheGermany

Personalised recommendations