The Journal of Supercomputing

, Volume 71, Issue 6, pp 2251–2274 | Cite as

Algorithmic aspects of graph reduction for hardware/software partitioning

  • Guiyuan Jiang
  • Jigang Wu
  • Siew-Kei Lam
  • Thambipillai Srikanthan
  • Jizhou Sun


The hardware/software (HW/SW) partitioning is a major concern in heterogeneous multi-processor system-on-a-chip design, where the large design space prohibits rapid identification of optimal HW/SW solutions to meet tight time-to-market constraints. In this paper, we propose graph reduction techniques to reduce the design space for HW/SW partitioning without sacrificing the partition quality. There are two major phases in the proposed approach: reducible sub-graph searching and sub-graph evaluation and reduction. In the former phase, we design a dynamic programming-based algorithm, named path flow algorithm (PFA), to identify reducible sub-graph candidates for directed acyclic graph (DAG) as most previous works use DAG as task graph model. We also propose algorithm DeLoop to transform an arbitrary directed graph into a DAG such that all reducible sub-graphs on the original graph can be detected by performing algorithm PFA on the DAG. Our approach overcomes the limitation of the existing approach by enabling the identification of candidate sub-graphs in arbitrary task graphs. In latter phase, we propose a reduction model which enables accurate estimation of task execution time on hardware and design a method to select candidate sub-graphs for reduction. Experimental results demonstrate that the proposed methods not only reduce the design space, but also notably improve the partitioning quality since hardware-parallel execution of tasks is taken into account in the proposed sub-graph reduction model.


Hardware/software partitioning Algorithm Graph reduction Reducible sub-graph 


  1. 1.
    Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. Comput Gr Forum 8(1):3–11Google Scholar
  2. 2.
    Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J Parallel Distrib Comput. 10(2):188–192CrossRefGoogle Scholar
  3. 3.
    Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessor theoretical properties and algorithms. Parallel Comput 21(11):1783–1805CrossRefGoogle Scholar
  4. 4.
    Arabnia HR, Smith JW (1993) A reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proceedings of the 7th annual international high performance computing conference, pp 349–357Google Scholar
  5. 5.
    Arató P, Mann ZÁ, Orbán A (2005) Algorithmic aspects of hardware/software partitioning. ACM Trans Des Autom Electron Syst (TODAES) 10(1):136–156Google Scholar
  6. 6.
    Tahaee SA, Jahangir AH (2010) A polynomial algorithm for partitioning problems. ACM Trans Embed Comput Syst (TECS) 9(4):1–38. (Article 34)Google Scholar
  7. 7.
    Vahid F, Gajski DD, Gong J (1994) A binary-constraint search algorithm for minimizing hardware during hardware/software partitioning. In: Proceedings of the conference on the European design automation (EURO-DAC), pp 214–219Google Scholar
  8. 8.
    Dick RP, Jha NK (1997) Mogac: a multiobjective genetic algorithm for the co-synthesis of hardware–software embedded systems, In: Proceedings of 1997 IEEE/ACM international conference on computer-aided design (ICCAD), pp 522–529Google Scholar
  9. 9.
    López-Vallejo M, López JC (2003) On the hardware–software partitioning problem: system modeling and partitioning techniques. ACM Trans Des Autom Electron Syst (TODAES) 8(3):269–297CrossRefGoogle Scholar
  10. 10.
    Vahid F (2002) Partitioning sequential programs for cad using a three-step approach. ACM Trans Des Autom Electron Syst (TODAES) 7(3):413–429CrossRefGoogle Scholar
  11. 11.
    Wu J, Srikanthan T, Tao J (2008) Algorithmic aspects for functional partitioning and scheduling in hardware/software co-design. Des Autom Embed Syst 12(4):345–375CrossRefGoogle Scholar
  12. 12.
    Henkel J (1999) A low power hardware/software partitioning approach for core-based embedded systems. In: Proceedings of the 36th ACM/IEEE design automation conference (DAC), pp 122–127Google Scholar
  13. 13.
    Stitt G, Vahid F (2002) The energy advantages of microprocessor platforms with on-chip configurable logic. IEEE Des Test Comput 19(6):36–43CrossRefGoogle Scholar
  14. 14.
    Mu J, Lysecky R (2009) Autonomous hardware/software partitioning and voltage/frequency scaling for low-power embedded systems. ACM Trans Des Autom Electron Syst (TODAES) 15(1):1–20CrossRefGoogle Scholar
  15. 15.
    Lysecky R (2007) Low-power warp processor for power efficient high-performance embedded systems. In: Proceedings of the design, automation and test in Europe conference and exhibition (DATE), pp 1–6Google Scholar
  16. 16.
    Banerjee S, Dutt N (2004) Efficient search space exploration for HW–SW partitioning. In: Proceedings of the IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis (CODES+ISSS), pp 122–127Google Scholar
  17. 17.
    Wu J, Srikanthan T, Chen G (2010) Algorithmic aspects of hardware/software partitioning: 1D search algorithms. IEEE Trans Comput 59(4):532–544CrossRefMathSciNetGoogle Scholar
  18. 18.
    Tahaee SA, Jahangir AH, Habibi-Masouleh H (2008) Improving the performance of heuristic searches with judicious initial point selection. In: Proceedings of the 5th IEEE international symposium on embedded computing, pp 14–19Google Scholar
  19. 19.
    Wu J, Srikanthan T (2006) Low-complex dynamic programming algorithm for hardware/software partitioning. Inf Process Lett 98(2):41–46CrossRefMATHGoogle Scholar
  20. 20.
    Niemann R, Marwedel P (1997) An algorithm for hardware/software partitioning using mixed integer linear programming. Des Autom Embed Syst 2(2):165–193CrossRefGoogle Scholar
  21. 21.
    Wu J, Wang P, Lam SK, Srikanthan T (2013) Efficient heuristic and tabu search for hardware/software partitioning. J Supercomput 66(1):118–134CrossRefGoogle Scholar
  22. 22.
    Monshizadeh N, Trentelman HL, Camlibel MK (2014) Projection based model reduction of multi-agent systems using graph partitions. IEEE Trans Control Netw Syst 1(2):145–154CrossRefMathSciNetGoogle Scholar
  23. 23.
    Shen C, Tong W (2013) A novel method of adopting graph reduction for resource management in parallel computing model. In: Proceedings of the 2013 IEEE international conference on green computing and communications and ieee internet of things and IEEE cyber, physical and social computing, pp 2218–2220Google Scholar
  24. 24.
    Song Y, Shi G (2012) Hierarchical graph reduction approach to symbolic circuit analysis with data sharing and cancellation-free properties. In: Proceedings of the 2012 17th Asia and South Pacific design automation conference (ASP-DAC), pp 541–546Google Scholar
  25. 25.
    Xu Y, Chu C (2009) GREMA: graph reduction based efficient mask assignment for double patterning technology. In: Proceedings of the IEEE/ACM international conference on computer-aided design (ICCAD), pp 601–606Google Scholar
  26. 26.
    Henke J, Ernst R (1995) A path-based technique for estimating hardware runtime in Hw/Sw-cosynthesis. In: Proceedings of the 8th international symposium on system synthesis, pp 116–121Google Scholar
  27. 27.
    Maciel P, Barros E, Rosenstiel W (1997) Computing communication cost by Petri nets for hardware/software codesign. In: Proceedings of the 8th international workshop on rapid system prototyping (RSP’97) shortening the path from specification to prototype, pp 44–56Google Scholar
  28. 28.
    Henkel J, Ernst R (1998) High-level estimation techniques for usage in hardware/software co-design. In: Proceedings of the Asia and South Pacific design automation conference (ASP-DAC), pp 353–360Google Scholar
  29. 29.
    Li Y, Henkel J (1998) A framework for estimating and minimizing energy dissipation of embedded HW/SW systems. In: Proceedings of the 35th design automation conference (DAC), pp 188–193Google Scholar
  30. 30.
    Junior M, Neto S, Maciel P, Lima R, Ribeiro A, Barreto R, Tavares E, Braga F (2006) Analyzing software performance and energy consumption of embedded systems by probabilistic modeling: an approach based on coloured Petri nets. In: Proceedings of the 27th international conference on applications and theory of Petri nets and other models of concurrency (ICATPN), Turku, pp.261–281 (pp 26–30)Google Scholar
  31. 31.
    Ye W, Ernst R, Benner T, Henkel J (1993) Fast timing analysis for hardware–software co-synthesis. In: Proceedings of the IEEE international conference on computer design: VLSI in computers and processors (ICCD), pp 452–457Google Scholar
  32. 32.
    Vahid F, Gajski DD (1995) Closeness metrics for system-level functional partitioning. In: Proceedings of the European design automation conference with EURO-VHDL (EURO-DAC), pp 328–333Google Scholar
  33. 33.
    Lam SK, Srikanthan T (2009) Rapid design of area-efficient custom instructions for reconfigurable embedded processing. J Syst Archit 55(1):1–14CrossRefGoogle Scholar
  34. 34.
    Lam SK, Srikanthan T, Clarke CT (2011) Architecture-aware technique for mapping area-time efficient custom instructions onto FPGAs. IEEE Trans Comput 60(5):680–692CrossRefMathSciNetGoogle Scholar
  35. 35.
    Chuong LM, Lam SK, Srikanthan T (2009) Area-time estimation of controller for porting C-based functions onto FPGA. In: Proceedings of the IEEE/IFIP international symposium on rapid system prototyping (RSP), pp 145–151Google Scholar
  36. 36.
    Zhao K, Bian J (2011) Instruction-level hardware/software partition through DFG exploration. In: Proceedings of the 15th international conference on computer supported cooperative work in design (CSCWD), pp 55–60Google Scholar
  37. 37.
    Wang D, Li S, Dou Y (2008) Collaborative hardware/software partition of coarse-grained reconfigurable system using evolutionary ant colony optimization. In: Proceedings of the 2008 Asia and South Pacific design automation conference (ASP-DAC), pp 679–684Google Scholar
  38. 38.
    Bouchhima A, Gerin P, Petrot F (2009) Automatic instrumentation of embedded software for high level hardware/software co-simulation. In: Proceedings of the 2009 Asia and South Pacific design automation conference (ASP-DAC), pp 546–551Google Scholar
  39. 39.
    Wolf W (1996) Object-oriented cosynthesis of distributed embedded systems. ACM Trans Des Autom Electron Syst (TODAES) 1(3):301–314CrossRefGoogle Scholar
  40. 40.
    Henkel J, Ernst R (2001) An approach to automated hardware/software partitioning using a flexible granularity that is driven by high-level estimation techniques. IEEE Trans Very Large Scale Integr (VLSI) Syst 9(2):273–289Google Scholar
  41. 41.
    Gibson D, Kumar R, McCurley KS, Tomkins A (2006) Dense subgraph extraction. In: Cook DJ, Holder LB (eds) Mining graph data. Wiley, HobokenGoogle Scholar
  42. 42.
    Cuzzocrea A, Song IY (2014) Big graph analytics: the state of the art and future research agenda. In: Proceedings of 17th international workshop on data warehousing and OLAP. ACM, pp 99–101Google Scholar
  43. 43.
    Quamar A, Deshpande A, Lin J (2014) NScale: neighborhood-centric analytics on large graphs. Proc VLDB 7(13):1673–1676CrossRefGoogle Scholar
  44. 44.
    Huan J, Wang W, Prins J, Yang J (2004) Spin: mining maximal frequent subgraphs from graph databases. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 581–586Google Scholar
  45. 45.
    Youness H, Hassan M, Sakanushi K, Takeuchi Y, Imai M, Salem A, Wahdan AM, Moness M (2009) A high performance algorithm for scheduling and hardware–software partitioning on MPSoCs. In: Proceedings of the 4th international conference on design and technology of integrated systems in nanoscale era (DTIS’09), pp 71–76Google Scholar
  46. 46.
    Han H, Liu W, Wu J, Jiang G (2013) Efficient algorithm for hardware/software partitioning and scheduling on MPSoC. J Comput 8(1):61–68CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Guiyuan Jiang
    • 1
    • 2
  • Jigang Wu
    • 2
    • 3
  • Siew-Kei Lam
    • 4
  • Thambipillai Srikanthan
    • 4
  • Jizhou Sun
    • 1
  1. 1.School of Computer Science and TechnologyTianjin UniversityTianjinChina
  2. 2.School of Computer Science and TechnologyGuangdong University of TechnologyGuangzhouChina
  3. 3.School of Computer Science and Software EngineeringTianjin Polytechnic UniversityTianjinChina
  4. 4.School of Computer EngineeringNanyang Technological UniversitySingaporeSingapore

Personalised recommendations