Journal of Signal Processing Systems

, Volume 57, Issue 2, pp 263–283 | Cite as

Multiprocessor, Multithreading and Memory Optimization for On-Chip Multimedia Applications

  • B. GirodiasEmail author
  • Y. Bouchebaba
  • G. Nicolescu
  • E. M. Aboulhamid
  • P. Paulin
  • B. Lavigueur


Multiprocessor System-on-Chip is one of the main drivers of the semiconductor industry revolution by enabling the integration of complex functionality on a single chip. The techniques for processor design and application optimizations can be combined together for more efficient design of these systems. Thus, the memory optimization techniques improving the data locality can be combined with multithreading technology, improving the overall processor efficiency. The combination of these techniques is mainly challenged by the adaptation of memory optimization techniques to the high parallelism offered by the multithreading environments. This paper presents an in-depth analysis of the impact of multiprocessor and multithreading environments on memory optimization techniques. A discussion is provided on the different types of parallelization (fine and coarse grain) and their influence on memory optimization technique. Some improvements on existing memory optimization techniques are presented as well some adaptation necessary to use them in this type of environment.


Multiprocessors System on Chip (MPSoC) Multi-threading Optimizations Parallelism Memory Multimedia 


  1. 1.
    Jerraya, A. A., & Wayne, W. (2005). Multiprocessor systems-on-chips, Elsevier ed.. United States of America: Morgan Kaufmann.Google Scholar
  2. 2.
    Wolf, W. (2004). The future of multiprocessor systems-on-chips. Design Automation Conference, pp. 681–685.Google Scholar
  3. 3.
    Haines, M., & Bohm, W. (1993). An evaluation of software multithreading in a conventional. Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, pp. 106–113.Google Scholar
  4. 4.
    Catthoor, F., Franssen, F., Wuytack, S., et al. (1994). Global communication and memory optimizing transformations for low. IEE Workshop on VLSI Signal Processing, VII, 178–187.CrossRefGoogle Scholar
  5. 5.
    Catthoor, F., Wuytack, S., Greef, E. D., et al. (1998). Custom memory management methodology—Exploration of memory organisation for embedded multimedia system design. Boston: Kluwer.zbMATHGoogle Scholar
  6. 6.
    Wolf, M. E., & Lam M. S. (1991). A data locality optimizing algorithm. Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation, pp. 30–44.Google Scholar
  7. 7.
    Paulin, P. G., Pilkington, C., Langevin, M., et al. (2006). Parallel programming models for a multiprocessor SoC platform applied to networking and multimedia. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14(7), 667–680.CrossRefGoogle Scholar
  8. 8.
    Carr, S., & Kennedy, K. (1994). Scalar replacement in the presence of conditional control flow. Software—Practice and Experience, 24(1), 51–77 (1994/01/).CrossRefGoogle Scholar
  9. 9.
    Greef, E. D. (1998). Storage size reduction for multimedia application. PhD thesis. Katholieke Universiteit, Leuven.Google Scholar
  10. 10.
    Olukotun, K., Nayfeh, B. A., Hammond, L., et al. (1996) The case for a single chip multiprocessor. Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 2–11.Google Scholar
  11. 11.
    Cierniak, M., & Li, W. (1995). Unifying data and control transformations for distributed shared-memory machines. Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation, pp. 205–217.Google Scholar
  12. 12.
    Darte, A. (1999). On the complexity of loop fusion. Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pp. 149–157.Google Scholar
  13. 13.
    Kennedy, K. (2001). Fast greedy weighted fusion. International Journal of Parallel Programming, 29(5), 463–491 (2001/10/).zbMATHCrossRefGoogle Scholar
  14. 14.
    Fraboulet, A., Kodary, K., & Mignotte, A. (2001). Loop fusion for memory space optimization. Proceedings of the 14th International Symposium on System Synthesis, pp. 95–100.Google Scholar
  15. 15.
    Marchal, P., Catthoor, F., & Gomez, J. I. (2004). Optimizing the memory bandwidth with loop fusion. CODES + ISSS 2004. International Conference on Hardware/Software Codesign and System Synthesis, pp. 188–193.Google Scholar
  16. 16.
    Kandemir, M., Kadayif, I., Choudhary, A., et al. (2002). Optimizing inter-nest data locality. Proceedings of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pp. 127–135.Google Scholar
  17. 17.
    Kandemir, M. (2002). Data space oriented tiling. Programming Languages and Systems. 11th European Symposium on Programming, ESOP 2002. Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2002. Proceedings (Lecture Notes in Computer Science 2305). pp. 178–193.Google Scholar
  18. 18.
    Li, F., & Kandemir, M. (2005). Locality-conscious workload assignment for array-based computations in MPSOC architectures. Proceedings of the 42nd. Design Automation Conference, pp. 95–100.Google Scholar
  19. 19.
    Krishnan, V., & Torrellas, J. (1999). A chip-multiprocessor architecture with speculative multithreading. IEEE Transactions on Computers, 48(9), 866–880.CrossRefGoogle Scholar
  20. 20.
    Van Achteren, T., Deconinck, G., Catthoor, F., et al. (2002). Data reuse exploration techniques for loop-dominated applications. Proceedings of Design, Automation and Test in Europe Conference and Exhibition, pp. 428–435.Google Scholar
  21. 21.
    Ilya, I., Erik, B., Miguel, M., et al. (2007). DRDU: A data reuse analysis technique for efficient scratch-pad memory management. ACM Transactions on Design Automation of Electronic Systems, 12(2), 15.CrossRefGoogle Scholar
  22. 22.
    Ghez, C., Miranda, M., Vandecappelle, A., et al. (2000). Systematic high-level address code transformations for piece-wise linear indexing: Illustration on a medical imaging algorithm. SiPS 2000. 2000 IEEE Workshop on Signal Processing Systems, pp. 603–612.Google Scholar
  23. 23.
    Catthoor, F., Danckaert, K., Kulkarni, K. K., et al. (2002). Data access and storage management for embedded programmable processors. p. 324. Berlin: Springer.Google Scholar
  24. 24.
    Schaumont, P., Lai, B.-C. C., Qin, W., et al. (2005). Cooperative multithreading on embedded multiprocessor architectures enables energy-scalable design. Proceedings of the 42nd Design Automation Conference, pp. 27–30.Google Scholar
  25. 25.
    Chong, Y.-K., & Hwang, K. (1995). Performance analysis of four memory consistency models for. IEEE Transactions on Parallel and Distributed Systems, 6(10), 1085–1099.CrossRefGoogle Scholar
  26. 26.
    Dimitroulakos, G., Galanis, M. D., & Goutis, C. E. (2005). Performance improvements using coarse-grain reconfigurable logic in embedded SOCs. International Conference on Field Programmable Logic and Applications, pp. 630–635.Google Scholar
  27. 27.
    Al-Hashimi, B. M. (2006). System-on-chip: Next Generation Electronics: IEE.Google Scholar
  28. 28.
    Forsell, M. J. (2005). Step caches—A novel approach to concurrent memory access on shared memory MP-SOCs. NORCHIP 23rd Conference, pp. 74–77.Google Scholar
  29. 29.
    Bouchebaba, Y., & Coelho, F. (2002). Tiling and memory reuse for sequences of nested loops. Euro-Par 2002 Parallel Processing. Proceedings of the 8th International Euro-Par Conference. (Lecture Notes in Computer Science Vol.2400), pp. 255–264.Google Scholar
  30. 30.
    Bouchebaba, Y., Girodias, B., Nicolescu, G., et al. (2007). MPSoC memory optimization using program transformation. ACM Transactions on Design Automation of Electronic Systems, 12(4), 43.CrossRefGoogle Scholar
  31. 31.
    Bouchebaba, Y., Lavigueur, B., Girodias, B., et al. (2007). MPSoC memory optimization for digital camera applications: Digital system design architectures, methods and tools, 2007. DSD 2007. 10th Euromicro Conference on “Digital System Design Architectures, Methods and Tools, 2007. DSD 2007, pp. 424–427.Google Scholar
  32. 32.
    Girodias, B., Bouchebaba, Y., Nicolescu, G., et al. (2006). Application-level memory optimization for MPSoC. Seventeenth IEEE International Workshop on Rapid System Prototyping, pp. 169–178.Google Scholar
  33. 33.
    Kwak, H., Lee, B., Hurson, A. R., et al. (1999). Effects of multithreading on cache performance. IEEE Transactions on Computers, 48(2), 176–184.CrossRefGoogle Scholar
  34. 34.
    Atitallah, R., Niar, S., Greiner, A., et al. (2006). Estimating energy consumption for an MPSoC architectural exploration. Architecture of Computing Systems—ARCS, pp. 298–310.Google Scholar
  35. 35.
    “SUIF,” November 2006.
  36. 36.

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • B. Girodias
    • 1
    Email author
  • Y. Bouchebaba
    • 1
  • G. Nicolescu
    • 1
  • E. M. Aboulhamid
    • 2
  • P. Paulin
    • 3
  • B. Lavigueur
    • 3
  1. 1.École Polytechnique de MontréalQuebecCanada
  2. 2.Université de MontréalQuebecCanada
  3. 3.STMicroelectronicsOttawaCanada

Personalised recommendations