A Lower Bound Technique for Communication on BSP with Application to the FFT

  • Gianfranco Bilardi
  • Michele Scquizzato
  • Francesco Silvestri
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7484)


Communication complexity is defined, within the Bulk Synchronous Parallel (BSP) model of computation, as the sum of the degrees of all the supersteps. A lower bound to the communication complexity is derived for a given class of DAG computations in terms of the switching potential of a DAG, that is, the number of permutations that the DAG can realize when viewed as a switching network. The proposed technique yields a novel and tight lower bound for the FFT graph.


Directed Acyclic Graph Output Node Communication Complexity Switching Network Bulk Synchronous Parallel 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aggarwal, A., Chandra, A.K., Snir, M.: Communication complexity of PRAMs. Theor. Comp. Sci. 71, 3–28 (1990)MathSciNetzbMATHCrossRefGoogle Scholar
  2. 2.
    Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Comm. ACM 31(9), 1116–1127 (1988)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Ballard, G., Demmel, J., Holtz, O., Schwartz, O.: Graph expansion and communication costs of fast matrix multiplication. In: Proc. 23rd SPAA, pp. 1–12. ACM (2011)Google Scholar
  4. 4.
    Bäumker, A., Dittrich, W., Meyer auf der Heide, F.: Truly efficient parallel algorithms: 1-optimal multisearch for an extension of the BSP model. Theor. Comp. Sci. 203(2), 175–203 (1998)zbMATHCrossRefGoogle Scholar
  5. 5.
    Bilardi, G., Pietracaprina, A., D’Alberto, P.: On the Space and Access Complexity of Computation DAGs. In: Brandes, U., Wagner, D. (eds.) WG 2000. LNCS, vol. 1928, pp. 47–58. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  6. 6.
    Bilardi, G., Pietracaprina, A., Pucci, G.: Decomposable BSP: A bandwidth-latency model for parallel and hierarchical computation. In: Handbook of Parallel Computing: Models, Algorithms and Applications, pp. 277–315. CRC Press (2007)Google Scholar
  7. 7.
    Bilardi, G., Pietracaprina, A., Pucci, G., Scquizzato, M., Silvestri, F.: Network-oblivious algorithms (to be submitted, 2012)Google Scholar
  8. 8.
    Bilardi, G., Pietracaprina, A., Pucci, G., Silvestri, F.: Network-oblivious algorithms. In: Proc. 21st IPDPS, pp. 1–10. IEEE (2007)Google Scholar
  9. 9.
    Bilardi, G., Preparata, F.: Processor-time tradeoffs under bounded-speed message propagation: Part II, lower bounds. Theor. Comp. Syst. 32(5), 531–559 (1999)MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Chowdhury, R.A., Silvestri, F., Blakeley, B., Ramachandran, V.: Oblivious algorithms for multicores and network of processor. In: Proc. 24th IPDPS, pp. 1–12. IEEE (2010)Google Scholar
  11. 11.
    Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. ACM Trans. Algorithms 8(1), 4:1–4:22 (2012)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Goodrich, M.T.: Communication-efficient parallel sorting. SIAM J. Computing 29(2), 416–432 (1999)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Hong, J.W., Kung, H.T.: I/O complexity: The red-blue pebble game. In: Proc. 13th STOC, pp. 326–333. ACM (1981)Google Scholar
  14. 14.
    Irony, D., Toledo, S., Tiskin, A.: Communication lower bounds for distributed-memory matrix multiplication. J. Par. & Distr. Comp. 64(9), 1017–1026 (2004)zbMATHCrossRefGoogle Scholar
  15. 15.
    Juurlink, B.H.H., Wijshoff, H.A.G.: A quantitative comparison of parallel computation models. ACM Trans. Comput. Syst. 16(3), 271–318 (1998)CrossRefGoogle Scholar
  16. 16.
    Koch, R.R., Leighton, F.T., Maggs, B.M., Rao, S.B., Rosenberg, A.L., Schwabe, E.J.: Work-preserving emulations of fixed-connection networks. J. ACM 44(1), 104–147 (1997)MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    Leighton, F.T.: Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. Morgan Kaufmann Publishers (1992)Google Scholar
  18. 18.
    Papadimitriou, C.H., Ullman, J.D.: A communication-time tradeoff. SIAM J. Computing 16(4), 639–646 (1987)MathSciNetzbMATHCrossRefGoogle Scholar
  19. 19.
    Savage, J.E.: Models of Computation: Exploring the Power of Computing. Addison-Wesley (1998)Google Scholar
  20. 20.
    Tiskin, A.: BSP (bulk synchronous parallelism). In: Encyclopedia of Parallel Computing, pp. 192–199. Springer (2011)Google Scholar
  21. 21.
    Tiskin, A.: The bulk-synchronous parallel random access machine. Theor. Comp. Sci. 196(1-2), 109–130 (1998)MathSciNetzbMATHCrossRefGoogle Scholar
  22. 22.
    de la Torre, P., Kruskal, C.P.: Submachine Locality in the Bulk Synchronous Setting. In: Fraigniaud, P., Mignotte, A., Robert, Y., Bougé, L. (eds.) Euro-Par 1996. LNCS, vol. 1124, pp. 352–358. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  23. 23.
    Valiant, L.G.: A bridging model for parallel computation. Comm. ACM 33(8), 103–111 (1990)CrossRefGoogle Scholar
  24. 24.
    Wu, C.L., Feng, T.Y.: The universality of the shuffle-exchange network. Trans. Computers 30, 324–332 (1981)MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Gianfranco Bilardi
    • 1
  • Michele Scquizzato
    • 1
  • Francesco Silvestri
    • 1
  1. 1.Department of Information EngineeringUniversity of PadovaItaly

Personalised recommendations