Recommendations for using Simulated Annealing in task mapping

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

A Multiprocessor System-on-Chip (MPSoC) may contain hundreds of processing elements (PEs) and thousands of tasks but design productivity is lagging the evolution of HW platforms. One problem is application task mapping, which tries to find a placement of tasks onto PEs which optimizes several criteria such as application runtime, intertask communication, memory usage, energy consumption, real-time constraints, as well as area in case that PE selection or buffer sizing are combined with the mapping procedure. Among optimization algorithms for the task mapping, we focus in this paper on Simulated Annealing (SA) heuristics. We present a literature survey and 5 general recommendations for reporting heuristics that should allow disciplined comparisons and reproduction by other researchers. Most importantly, we present our findings about SA parameter selection and 7 guidelines for obtaining a good trade-off made between solution quality and algorithm’s execution time. Notably, SA is compared against global optimum. Thorough experiments were performed with 2–8 PEs, 11–32 tasks, 10 graphs per system, and 1000 independent runs, totaling over 500 CPU days of computation. Results show that SA offers 4–6 orders of magnitude reduction is optimization time compared to brute force while achieving high quality solutions. In fact, the globally optimum solution was achieved with a 1.6—90 % probability when problem size is around 1e9–4e9 possibilities. There is approx. 90 % probability for finding a solution that is at most 18 % worse than optimum.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

References

  1. 1.

    Ali S, Kim J-K, Siegel HJ, Maciejewski AA (2008) Static heuristics for robust resource allocation of continuously executing applications. J Parallel Distrib Comput 68(8):1070–1080. ISSN 0743-7315. doi:10.1016/j.jpdc.2007.12.007

    Article  MATH  Google Scholar 

  2. 2.

    Bailey DH (1991) Twelve ways to fool the masses when giving performance results on parallel computers. Supercomput Rev 4(8):54–55. http://crd.lbl.gov/~dhbailey/dhbpapers/twelve-ways.pdf

    Google Scholar 

  3. 3.

    Barr RS, Golden BL, Kelly JP, Resende MGC, Stewart WR (1995) Designing and reporting on computational experiments with heuristic methods. Springer J Heuristics 1(1):9–32

    Article  MATH  Google Scholar 

  4. 4.

    Bollinger SW, Midkiff SF (1991) Heuristic technique for processor and link assignment in multicomputers. IEEE Trans Comput 40:325–333

    Article  Google Scholar 

  5. 5.

    Braun TD, Siegel HJ, Beck N (2001) A comparison of eleven static heuristics for mapping a class if independent tasks onto heterogeneous distributed systems. IEEE J Parallel Distrib Comput 61:810–837

    Article  Google Scholar 

  6. 6.

    Coroyer C, Liu Z (1991) Effectiveness of heuristics and simulated annealing for the scheduling of concurrent tasks an empirical comparison. Rapport de recherche de l’INRIA Sophia Antipolis 1379

  7. 7.

    DCS task mapper (2010) A task mapping and scheduling tool for multiprocessor systems. http://wiki.tut.fi/DACI/DCSTaskMapper

  8. 8.

    This paper’s experiment data files (2012). http://zakalwe.fi/~shd/task-mapping/experiment-data-2012-11.tar.gz

  9. 9.

    Dorigo M, Stützle T (2004) Ant colony optimization. MIT Press, Cambridge. ISBN 0-262-04219-3

    Book  MATH  Google Scholar 

  10. 10.

    Ercal F, Ramanujam J, Sadayappan P (1988) Task allocation onto a hypercube by recursive mincut bipartitioning. ACM, New York, pp 210–221. http://dl.acm.org/citation.cfm?id=62323

    Google Scholar 

  11. 11.

    Ferrandi F, Pilato C, Sciuto D, Tumeo A (2010) Mapping and scheduling of parallel C applications with ant colony optimization onto heterogeneous reconfigurable MPSoCs. In: Design automation conference (ASP-DAC), 15th, 2010, Asia and South Pacific, pp 799–804

    Google Scholar 

  12. 12.

    Girkar M, Polychronopoulos CD (1992) Automatic extraction of functional parallelism from ordinary programs. IEEE Trans Parallel Distrib Syst 3(2):166–178

    Article  Google Scholar 

  13. 13.

    Gries M (2004) Methods for evaluating and covering the design space during early design development. Integr VLSI J 38(2):131–183

    Article  Google Scholar 

  14. 14.

    Jobqueue (2010) A tool for parallelizing jobs to a cluster of computers. http://zakalwe.fi/~shd/foss/jobqueue/

  15. 15.

    Kahn G (1974) The semantics of a simple language for parallel programming. In: Proceedings of IFIP Congress 74, information processing 74, pp 471–475. http://www1.cs.columbia.edu/~sedwards/papers/kahn1974semantics.pdf

    Google Scholar 

  16. 16.

    Kim J-K, Shivle S, Siegel HJ, Maciejewski AA, Braun TD, Schneider M, Tideman S, Chitta R, Dilmaghani RB, Joshi R, Kaul A, Sharma A, Sripada S, Vangari P, Yellampalli SS (2007) Dynamically mapping tasks with priorities and multiple deadlines in a heterogeneous environment. J Parallel Distrib Comput Elsevier 67:154–169

    Article  MATH  Google Scholar 

  17. 17.

    Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 200(4598):671–680

    Article  MathSciNet  Google Scholar 

  18. 18.

    Koch P (1995) Strategies for realistic and efficient static scheduling of data independent algorithms onto multiple digital signal processors. Doctoral thesis, The DSP Research Group, Institute for Electronic Systems, Aalborg University, Aalborg, Denmark

  19. 19.

    kpn-generator (2009) A program for generating random Kahn process network graphs. http://zakalwe.fi/~shd/foss/kpn-generator/

  20. 20.

    Kwok Y-K, Ahmad I, Gu J (1996) FAST: a low-complexity algorithm for efficient scheduling of DAGs on parallel processors. In: Proceedings of international conference on parallel processing, vol II, pp 150–157

    Google Scholar 

  21. 21.

    Kwok Y-K, Ahmad I (1999) FASTEST: a practical low-complexity algorithm for compile-time assignment of parallel programs to multiprocessors. IEEE Trans Parallel Distrib Syst 10(2):147–159

    Article  Google Scholar 

  22. 22.

    Lin F-T, Hsu C-C (1990) Task assignment scheduling by simulated annealing. In: IEEE region conference on computer and communication systems, Hong Kong, September 1990

    Google Scholar 

  23. 23.

    Matousek J, Gärtner B (2006) Understanding and using linear programming. Springer, Berlin. ISBN 978-3540306979

    Google Scholar 

  24. 24.

    Nanda AK, DeGroot D, Stenger DL (1992) Scheduling directed task graphs on multiprocessors using simulated annealing. In: Proceedings of 12th IEEE international conference on distributed systems, pp 20–27

    Google Scholar 

  25. 25.

    Orsila H (2011) Optimizing algorithms for task graph mapping on multiprocessor system on chip. Doctoral thesis, Tampere University of Technology, Department of Computer Systems. http://dspace.cc.tut.fi/dpub/handle/123456789/20519

    Google Scholar 

  26. 26.

    Orsila H, Kangas T, Salminen E, Hämäläinen TD (2006) Parameterizing simulated annealing for distributing task graphs on multiprocessor SoCs. In: International symposium on system-on-chip, Tampere, Finland, Nov 14–16, pp 73–76

    Google Scholar 

  27. 27.

    Orsila H, Kangas T, Salminen E, Hännikäinen M, Hämäläinen TD (2007) Automated memory-aware application distribution for multi-processor system-on-chips. J Syst Archit 53(11):795–815. ISSN 1383-7621

    Article  Google Scholar 

  28. 28.

    Orsila H, Salminen E, Hännikäinen M, Hämäläinen TD (2007) Optimal subset mapping and convergence evaluation of mapping algorithms for distributing task graphs on multiprocessor SoC. In: International symposium on system-on-chip, Tampere, Finland, Nov 19–21, 2007

    Google Scholar 

  29. 29.

    Orsila H, Salminen E, Hämäläinen TD (2008) Best practices for simulated annealing in multiprocessor task distribution problems. In: Simulated annealing, pp 321–342. ISBN 978-953-7619-07-7. Chap. 16, I-Tech Education and Publishing KG

    Google Scholar 

  30. 30.

    Orsila H, Salminen E, Hämäläinen TD (2009) Parameterizing simulated annealing for distributing Kahn process networks on multiprocessor SoCs. In: International symposium on system-on-chip, Tampere, Finland, Oct 5–7, 2009

    Google Scholar 

  31. 31.

    Ravindran K (2007) Task allocation and scheduling of concurrent applications to multiprocessor systems. Doctoral thesis, UCB/EECS-2007-149. http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-149.html

  32. 32.

    SA+AT C reference implementation. http://zakalwe.fi/~shd/task-mapping

  33. 33.

    Sato M (2002) OpenMP: parallel programming API for shared memory multiprocessors and on-chip multiprocessors. In: Proceedings of the 15th international symposium on system synthesis, pp 109–111. ACM, New York

    Google Scholar 

  34. 34.

    Sih GC, Lee EA (1993) A compile-time scheduling heuristics for interconnection-constrained heterogeneous processor architectures. IEEE Trans Parallel Distrib Syst 4(2):175–187

    Article  Google Scholar 

  35. 35.

    Wild T, Brunnbauer W, Foag J, Pazos N (2003) Mapping and scheduling for architecture exploration of networking SoCs. In: Proc. 16th int. conference on VLSI design, pp 376–381

    Google Scholar 

  36. 36.

    Wolf W (2004) The future of multiprocessor systems-on-chips. In: Design automation conference 2004, pp 681–685

  37. 37.

    Xu J, Hwang K (1990) A simulated annealing method for mapping production systems onto multicomputers. In: Proceedings of the sixth conference on artificial intelligence applications. IEEE Press, New York, pp 130–136. ISBN 0-8186-2032-3

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Heikki Orsila.

Appendix: Convergence results to larger systems with 3–6 PEs

Appendix: Convergence results to larger systems with 3–6 PEs

Table 13 Proportion of SA+AT runs that converged within p from global optimum for 3 PEs and 21 nodes. A higher value is better. SA+AT chooses L=42. The 90 % level is marked in boldface on each column
Table 14 Approximate expected number of mappings for SA+AT with 3 PEs and 21 nodes. SA+AT chooses L=42. The best values (smallest) are in boldface for each performance level p (row)
Table 15 Proportion of SA+AT runs that converged within p from global optimum for 4 PEs and 17 nodes. A higher value is better. SA+AT chooses L=51
Table 16 Approximate expected number of mappings for SA+AT with 4 PEs and 17 nodes. SA+AT chooses L=51
Table 17 Proportion of SA+AT runs that converged within p from global optimum for 6 PEs and 13 nodes. A higher value is better. SA+AT chooses L=65
Table 18 Approximate expected number of mappings for SA+AT with 6 PEs and 13 nodes. SA+AT chooses L=65

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Orsila, H., Salminen, E. & Hämäläinen, T. Recommendations for using Simulated Annealing in task mapping. Des Autom Embed Syst 17, 53–85 (2013). https://doi.org/10.1007/s10617-013-9119-0

Download citation

Keywords

  • Simulated Annealing
  • Task mapping
  • Task graph
  • Global optimum