Journal of Signal Processing Systems

, Volume 67, Issue 3, pp 213–228 | Cite as

Cost Minimization with HPDFG and Data Mining for Heterogeneous DSP

  • Jian-Wei Niu
  • Meikang QiuEmail author
  • Xiaofei Wang
  • Jiayin Li
  • Gang Wu
  • Tianzhou Chen


Cost minimization and execution-time reduction have become the most important issues in today’s real-time embedded system. Meanwhile, for the DSP (Digital Signal Processing) applications running on embedded system, loops inside them are the most critical part for performance optimization. To optimize the loop iteration patterns, we need to schedule the loop execution order. Due to the uncertainties within the execution time of tasks, we model varied execution times of tasks as random variables and propose a novel data graph model, called HPDFG (Heterogeneous Probabilistic Data-Flow Graph) to model DSP applications on embedded systems. A novel algorithm, LSHAPE, is proposed to minimize the cost and satisfy the timing constraints. First of all, we use the data mining methods to estimate the probabilistic distribution of the execution time variables. Second, we rotate the loops in the application to explore different possible execution patterns. Finally, we combine the list-scheduling and the dynamic programming to generate a near-optimal task allocation and a core-mode assignment. Experimental results demonstrate the effectiveness of our algorithm. Our approach can handle loops efficiently.


Data mining Prefetching Cost minimization Heterogeneous Timing 



This work was supported in part by The Research Fund of the State Key laboratory of Software Development Environment BUAA SKLSDE-2010ZX-13, NSFC 61071061, and NSFC 60873241; The NSFC 61071061 and the University of Kentucky Start Up Fund; National High-Tech R&D Plan of China 2009AA01Z123; The NSFC 61070001, RFEB Zhejiang Y200803333 and Y200909683, State Key Lab of High-end Server Storage Tech. 2009HSSA10, National Key Lab STASI, SFKPC 2009ZX01039-002-001-04, 2009ZX03001-016, 2009ZX03004-005.


  1. 1.
    Wolf, W. (2006). Design challenges in multiprocessor Systems-On-Chip. In B. Kleinjohann, L. Kleinjohann, R. Machado, C. Pereira, & P. S. Thiagarajan (Eds.), IFIP international federation for information processing, from model-driven design to resource management for distributed embedded systems (Vol. 225, pp. 1–8). Boston: Springer.Google Scholar
  2. 2.
    Shao, Z., Zhuge, Q., Xue, C., & Sha, E. H.-M. (2005). Efficient assignment and scheduling for heterogeneous DSP systems. IEEE Transactions on Parallel and Distributed Systems, 16, 516–525.CrossRefGoogle Scholar
  3. 3.
    Tongsima, S., Sha, E. H.-M., Chantrapornchai, C., Surma, D., & Passose, N. (2000). Probabilistic loop scheduling for applications with uncertain execution time. IEEE Transactions on Computers, 49, 65–80.CrossRefGoogle Scholar
  4. 4.
    Hua, S., Qu, G., & Bhattacharyya, S. (2003). Exploring the probabilistic design space of multimedia systems. In IEEE international workshop on rapid system prototyping (pp. 233–240).Google Scholar
  5. 5.
    Hua, S., Qu, G., & Bhattacharyya, S. (2003). Energy reduction techniques for multimedia applications with tolerance to deadline misses. In DAC (pp. 131–136).Google Scholar
  6. 6.
    Hua, S., & Qu, G. (2003). Approaching the maximum energy saving on embedded systems with multiple voltages. In International conference on computer aid design (ICCAD) (pp. 26–29).Google Scholar
  7. 7.
    Zhou, T., Hu, X., & Sha, E. H.-M. (2001). Estimating probabilistic timing performance for real-time embedded systems. IEEE Transactions on Very Large Scale Integration(VLSI) Systems, 9(6), 833–844.CrossRefGoogle Scholar
  8. 8.
    Tia, T., Deng, Z., Shankar, M., Storch, M., Sun, J., Wu, L., et al. (1995). Probabilistic performance guarantee for real-time tasks with varying computation times. In Proceedings of real-time technology and applications symposium (pp. 164–173).Google Scholar
  9. 9.
    Qiu, M., Yang, L. T., Shao, Z., & Sha, E. (2009). Rotation scheduling and voltage assignment to minimize energy for SoC. In IEEE embedded and ubiquitous computing (EUC), best paper award (pp. 48–55).Google Scholar
  10. 10.
    Qiu, M., Xue, C., Zhuge, Q., Yang, L. T., Shao, Z., & Sha, E. H.-M. (2007). Energy minimization with soft real-time and DVS for uniprocessor and multiprocessor embedded systems. In IEEE design, automation and test in Europe (pp. 1641–1646).Google Scholar
  11. 11.
    Qiu, M., Jia, Z., Xue, C., Z.Shao, Liu, Y., & Sha, E. H.-M. (2006). Loop scheduling to minimize cost with data mining and prefetching for heterogeneous DSP. In 18th IASTED parallel and distributed computing and systems (PDCS).Google Scholar
  12. 12.
    Qiu, M., Jia, Z., Xue, C., Shao, Z., & Sha, E. (2007). Voltage assignment with guaranteed probability satisfying timing constraint for real-time multiproceesor DSP. Journal of VLSI Signal Processing, 46(1), 55–73.CrossRefGoogle Scholar
  13. 13.
    Qiu, M., Yang, L. T., Shao, Z., & Sha, E. (2010). Dynamic and leakage energy minimization with soft real-time loop scheduling and voltage assignment. IEEE Transactions on Very Large Scale Integration Systems, 18(3), 501–504.CrossRefGoogle Scholar
  14. 14.
    Shestak, V., Smith, J., Siegel, H. J., & Maciejewski, A. A. (2006). Iterative algorithms for stochastically robust static resource allocation in periodic sensor driven clusters. In International conference on parallel and distributed computing and systems (pp. 166–174).Google Scholar
  15. 15.
    Smith, J., Chong, E. K. P., Maciejewski, A. A., & Siegel, H. J. (2009). Stochastic-based robust dynamic resource allocation in a heterogeneous computing system. In International conference on parallel processing (pp. 188–195).Google Scholar
  16. 16.
    Paulin, P. G., & Knight, J. P. (1989). Force-directed scheduling for the behavioral synthesis of asic’s. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 8, 661–679.CrossRefGoogle Scholar
  17. 17.
    Ito, K., Lucke, L., & Parhi, K. (1998). Ilp-based cost-optimal dsp synthesis with module selection and data format conversion. IEEE Transactions on VLSI Systems, 6, 582–594.CrossRefGoogle Scholar
  18. 18.
    Ito, K., & Parhi, K. (1995). Register minimization in cost-optimal synthesis of dsp architecture. In Proc. of the IEEE VLSI signal processing workshop.Google Scholar
  19. 19.
    Yu, Y., Ren, S., & Xiaobo, S. H. (2009). A metric for judicious relaxation of timing constraints in soft real-time systems. In 15th IEEE real-time and embedded technology and applications symposium (pp. 163–172).Google Scholar
  20. 20.
    Chantem, T., Xiaobo, S. H, & Lemmon, M. D. (2009). Generalized elastic scheduling for real-time tasks. IEEE Transactions on Computers, 58(4), 480–495.CrossRefGoogle Scholar
  21. 21.
    Chao, L.-F., LaPaugh, A., & Sha, E. H.-M. (1997). Rotation scheduling: A loop pipelining algorithm. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 16, 229–239.CrossRefGoogle Scholar
  22. 22.
    Zhang, Y., Hu, X., & Chen, D. Z. (2002). Task scheduling and voltage selection for energy minimization. In DAC (pp. 183–188).Google Scholar
  23. 23.
    Leiserson, C. E., & Saxe, J. B. (1991). Retiming synchronous circuitry. Algorithmica, 6, 5–35.MathSciNetzbMATHCrossRefGoogle Scholar
  24. 24.
    Han, J., & Kamber, M. (2000). Data mining: Concepts and techniques. New York: Morgan-Kaufman.Google Scholar
  25. 25.
    Liu, H., & Setiono, R. (1995). Chi2: Feature selection and discretization of numeric attributes. In International conference on tools with artificial intelligence.Google Scholar
  26. 26.
    Weiss, S. M., & Indurkhya, N. (1997). Predictive data mining: A practical guide. New York: Morgan-Kaufman.Google Scholar
  27. 27.
    Hastie, T., Tibshirani, R., & Friedman, J. H. (2001). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.zbMATHGoogle Scholar
  28. 28.
    Srinivasan, V., Davidson, E. S., Tyson, G. S., Charney, M. J., & Puzak, T.R. (2001). Branch history guided instruction prefetching. In Proc. of the 7th int’l conference on high performance computer architecture (HPCA) (pp. 291–300). Monterrey, Mexico.Google Scholar
  29. 29.
    Tse, J., & Smith, A. J. (1998). Cpu cache prefetching: Timing evaluation of hardware implementations. IEEE Transactions on Computers, 47(5), 509–526.CrossRefGoogle Scholar
  30. 30.
    Hsu, W.-C., & Smith, J. E. (1998). A performance study of instruction cache prefetching methods. IEEE Transactions on Computers, 47(5), 497–508.CrossRefGoogle Scholar
  31. 31.
    Zhang, Y., Haga, S., & Barua, R. (2002). Execution history guided instruction prefetching. In Intl. conf. on supercomputing (pp. 199–208).Google Scholar
  32. 32.
    Joseph, D., & Grunwald, D. (1999). Prefetching using markov predictors. IEEE Transactions on Computers, 48(2), 121–133.CrossRefGoogle Scholar
  33. 33.
    Mutlu, O., Stark, J., Wilkerson, C., & Patt, Y. N. (2003). Runahead execution: An alternative to very large instruction windows for out-of-order processors. In IEEE HPCA-9.Google Scholar
  34. 34.
    Spracklen, L., Chou, Y., & Abraham, S. G. (2005). Effective instruction prefetching in chip multiprocessors for modern commercial applications. In IEEE HPCA-11.Google Scholar
  35. 35.
    Yang, C., Lebeck, A., Tseng, H., & Lee, C. (2004). Tolerating memory latency through push prefetching for pointer-intensive applications. In ACM transactions on architecture and code optimization (pp. 445–475).Google Scholar
  36. 36.
    Simunic, T., Benini, L., De Micheli, G., & Hans, M. (2000). Source code optimization and profiling of energy consumption in embedded systems. In 13th international symposium on system synthesis (pp. 193–198).Google Scholar
  37. 37.
    Kumar, C. M., Sindhwani, M., & Srikanthan, T. (2008). Profile-based technique for dynamic power management in embedded systems. In International conference on electronic design (pp. 1–6).Google Scholar
  38. 38.
    Luo, J., & Jha, N. K. (2003). Power-profile driven variable voltage scaling for heterogeneous distributed real-time embedded systems. In 16th international conference on VLSI design (pp. 369–375).Google Scholar
  39. 39.
    Xie, Y., Wolf, W., & Lekatsas, H. (2003). Profile-driven selective code compression. In Design, automation and test in Europe conference and exhibition (pp. 462–467).Google Scholar
  40. 40.
    Leskela, J., Nikula, J., & Salmela, M. (2009). Opencl embedded profile prototype in mobile device. In IEEE workshop on signal processing systems (pp. 279–284).Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Jian-Wei Niu
    • 1
  • Meikang Qiu
    • 2
  • Xiaofei Wang
    • 3
  • Jiayin Li
    • 2
  • Gang Wu
    • 4
  • Tianzhou Chen
    • 5
  1. 1.State Key Laboratory of Software Development EnvironmentBeihang UniversityBeijingChina
  2. 2.Department of Electrical and Computer EngineeringUniversity of KentuckyLexingtonUSA
  3. 3.School of Computer Science and EngineeringSeoul National UniversitySeoulKorea
  4. 4.School of SoftwareShanghai Jiao Tong UniversityShanghaiPeople’s Republic of China
  5. 5.College of Computer ScienceZhejiang UniversityHangzhouPeople’s Republic of China

Personalised recommendations