Optimization of Parallel FDTD Computations Based on Structural Redeployment of Macro Data Flow Nodes

  • Adam Smyk
  • Marek Tudruj
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3911)


This paper shows methodology, which enables profiling macro data flow graphs (MDFG) that represent computation and communication patterns for the Finite Difference Time Domain (FDTD) problem in irregular computational areas. MDFG optimization is performed in three phases: simulation area partitioning with generation of initial MDFG, macro data nodes merging with static load balancing to obtain given number of macro nodes and communication optimization to minimize (balance) inter-node data transmissions, computational cells redeployment to take into account computational system restrictions. Efficiency of computations for several communication systems (MPI, RDMA RB, SHMEM) is discussed. Experimental results obtained by simulation are presented.


Execution Time Average Execution Time Simulation Area Computational Area Graph Execution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bharadwaj, D.G., Mani, V., Robertazi, T.G.: Scheduling Divisible Loads in Parallel and Distributed Systems. IEEE Computer Society Press, Los Alamitos (1996)Google Scholar
  2. 2.
    Dutt, S., Deng, W.: VLSI Circuit Partitioning by Cluster-Removal using Iterative Improve-ment Techniques. In: Proc. IEEE International Conference on Computer-Aided Design, pp. 350–355 (1997)Google Scholar
  3. 3.
    Fiduccia, C.M., Mattheyses, R.M.: A Linear Time Heuristic for Improving Network Partitions. In: Proc. Nineteenth Design Automation Conference, pp. 175–181 (1982)Google Scholar
  4. 4.
    Garey, M., Johnson, D., Stockmeyer, L.: Some simplified NP-complete graph problems. Theoretical Computer Science 1, 237–267 (1976)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Karypis, G., Kumar, V.: Unstructured Graph Partitioning and Sparse Matrix Ordering, Technical Report, Department of Computer Science, University of Minesota (1995), http://www.cs.umn.edu/~kumar
  6. 6.
    Khan, M.S., Li, K.F.: Fast Graph Partitioning Algorithms. In: Proceedings of IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, Victoria, B.C., Canada, May 1995, pp. 337–342 (1995)Google Scholar
  7. 7.
    Kerighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. AT&T Bell Labs. Tech. J. 49, 291–307 (1970)CrossRefGoogle Scholar
  8. 8.
    Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Lin, H.X., van Gemund, A.J.C., Meijdam, J.: Scalability analysis and parallel execution of unstructured problems. In: Eurosim 1996 Conference (1996)Google Scholar
  10. 10.
    Sedgewick, R.: Algorithms in C, Part 5: Graph Algorithms, 3rd edn., pages 368. Addison-Wesley Professional, Reading (2001)Google Scholar
  11. 11.
    Smyk, A., Tudruj, M.: RDMA Control Support for Fine-Grain Parallel Computations. In: PDP 2004, La Coruna, Spain (2004)Google Scholar
  12. 12.
    Smyk, A., Tudruj, M.: Parallel Implementation of FDTD Computations Based on Macro Data Flow Paradigm. In: PARELEC 2004, September 7-10, Dresden, Germany (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Adam Smyk
    • 1
  • Marek Tudruj
    • 2
  1. 1.Polish-Japanese Institute of Information TechnologyWarsawPoland
  2. 2.Institute of Computer SciencePolish Academy of SciencesWarsawPoland

Personalised recommendations