Methods for Partitioning Data to Improve Parallel Execution Time for Sorting on Heterogeneous Clusters

  • Christophe Cérin
  • Jean-Christophe Dubacq
  • Jean-Louis Roch
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3947)


The aim of the paper is to introduce general techniques in order to optimize the parallel execution time of sorting on a distributed architectures with processors of various speeds. Such an application requires a partitioning step. For uniformly related processors (processors speeds are related by a constant factor), we develop a constant time technique for mastering processor load and execution time in an heterogeneous environment and also a technique to deal with unknown cost functions. For non uniformly related processors, we use a technique based on dynamic programming. Most of the time, the solutions are in \({\mathcal O}\)(p) (p is the number of processors), independent of the problem size n. Consequently, there is a small overhead regarding the problem we deal with but it is inherently limited by the knowing of time complexity of the portion of code following the partitioning.


parallel in-core sorting heterogeneous computing complexity of parallel algorithms data distribution 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lastovetsky, A., Reddy, R.: Data partitioning with a realistic performance model of networks of heterogenenous computers. In: Proc. 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), Santa-Fe, New-Mexico. CD–ROM publication (2004)Google Scholar
  2. 2.
    Drozdowski, M., Lawenda, M.: On optimum multi-installment divisible load processing in heterogeneous distributed systems. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 231–240. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  3. 3.
    Li, H., Sevcik, K.C.: Parallel sorting by overpartitioning. In: Proceedings of the 6th Annual Symposium on Parallel Algorithms and Architectures, pp. 46–56. ACM Press, New York (1994)Google Scholar
  4. 4.
    Reif, J.H., Valiant, L.G.: A Logarithmic time Sort for Linear Size Networks. Journal of the ACM 34(1), 60–76 (1987)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Reif, J.H., Valiant, L.G.: A logarithmic time sort for linear size networks. In: Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing, Boston, Massachusetts, pp. 10–16 (1983)Google Scholar
  6. 6.
    Shi, H., Schaeffer, J.: Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing 14(4), 361–372 (1992)CrossRefMATHGoogle Scholar
  7. 7.
    Li, X., Lu, P., Schaeffer, J., Shillington, J., Wong, P.S., Shi, H.: On the versatility of parallel sorting by regular sampling. Parallel Computing 19, 1079–1103 (1993)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Helman, D.R., JáJá, J., Bader, D.A.: A new deterministic parallel sorting algorithm with an experimental evaluation. Tech. Rep. CS-TR-3670 and UMIACS-TR-96-54, Institute for Advanced Computer Studies, Univ. of Maryland (1996)Google Scholar
  9. 9.
    Cérin, C., Gaudiot, J.L.: Evaluation of two BSP libraries through parallel sorting on clusters. In: Proceedings of WCBC 2000 (Workshop on Cluster-Based Computing) in conjunction with ICS 2000 (International Conference on Supercomputing), Santa Fe, New Mexico, pp. 21–26 (2000)Google Scholar
  10. 10.
    Cérin, C., Gaudiot, J.L.: An over-partitioning scheme for parallel sorting on clusters running at different speeds. In: IEEE International Conference on Cluster Computing, Cluster 2000, T.U. Chemnitz, Saxony, Germany, Poster (2000)Google Scholar
  11. 11.
    Cérin, C., Gaudiot, J.L.: Parallel sorting algorithms with sampling techniques on clusters with processors running at different speeds. In: Prasanna, V.K., Vajapeyam, S., Valero, M. (eds.) HiPC 2000. LNCS, vol. 1970, p. 301. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  12. 12.
    Cérin, C., Gaudiot, J.L.: On a scheme for parallel sorting on heterogeneous clusters. FGCS (Future Generation Computer Systems 18(4) (2002); The special issue is preliminary scheduled for publication in future vol.Google Scholar
  13. 13.
    Cérin, C.: An out-of-core sorting algorithm for clusters with processors at different speed. In: 16th International Parallel and Distributed Processing Symposium (IPDPS), Ft Lauderdale, Florida, USA (2002), Available on CDROM from IEEE Computer SocietyGoogle Scholar
  14. 14.
    Cérin, C., Koskas, M., Jemni, M., Fkaier, H.: Improving parallel execution time of sorting on heterogeneous clusters. In: Proc. 16th Int. Symp. on Comp. Architecture and High Performance Computing (SBAC 2004), Foz-do-Iguazu, Brazil (2004)Google Scholar
  15. 15.
    Corless, R., Jeffrey, D., Knuth, D.: A sequence of series for the lambert w function. In: Kuechlin, W.W. (ed.) Proc. of ISSAC 1997, Maui, Hawaii, pp. 197–204. ACM, New York (1997)Google Scholar
  16. 16.
    Frigo, M., Johnson, S.G.: The design and implementation of fftw3. Proceedings of the IEEE, Special issue on Program Generation, Optimization, and Platform Adaptation, 216–231 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Christophe Cérin
    • 1
  • Jean-Christophe Dubacq
    • 1
  • Jean-Louis Roch
    • 2
  1. 1.LIPN, CNRS UMR 7030Université de Paris NordVilletaneuseFrance
  2. 2.CNRS – INRIA – INPG – UJF, Projet MOAISID-IMAGMontbonnot-Saint-MartinFrance

Personalised recommendations