List Scheduling in Embedded Systems Under Memory Constraints

Arras, Paul-Antoine; Fuin, Didier; Jeannot, Emmanuel; Stoutchinin, Arthur; Thibault, Samuel

doi:10.1007/s10766-014-0338-1

List Scheduling in Embedded Systems Under Memory Constraints

Published: 12 November 2014

Volume 43, pages 1103–1128, (2015)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Paul-Antoine Arras^1,2,3,
Didier Fuin²,
Emmanuel Jeannot¹,
Arthur Stoutchinin² &
…
Samuel Thibault^1,3

288 Accesses
4 Citations
Explore all metrics

Abstract

Video decoding and image processing in embedded systems are subject to strong resource constraints, particularly in terms of memory. List-scheduling heuristics with static priorities (HEFT, SDC, etc.) being the oft-cited solutions due to both their good performance and their low complexity, we propose a method aimed at introducing the notion of memory into them. Moreover, we show that through adequate adjustment of task priorities and judicious resort to insertion-based policy, speedups up to 20 % can be achieved. We also show that our technique allows to prevent deadlock and to substantially reduce the required memory footprint compared to classic list-scheduling heuristics. Lastly, we propose a methodology to assess the appropriateness of dynamic scheduling in this context.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Reference frame context-adaptive variable-length coder: a real-time hardware-friendly approach for lossless external memory bandwidth reduction in current video-coding systems

Article 09 August 2014

Dieison Silveira, Bruno Zatt, … Marcelo Porto

Performance analysis of optimized versatile video coding software decoders on embedded platforms

Article Open access 31 October 2023

Anup Saha, Wassim Hamidouche, … Ibrahim Farhat

Low-energy motion estimation memory system with dynamic management

Article 11 June 2021

Dieison Soares Silveira, Lívia Amaral, … Sergio Bampi

Notes

The bottom level is also sometimes referred to as the upward rank.
The orders of magnitude of the latency for local and external memories are respectively 1 cycle and 100 cyles.
A token is the smallest unit of data that can be processed by a task. It is application specific; e.g. for an image-processing algorithm, it can be a line of pixels.
Thus, from a processing viewpoint, pixel lines are independent.
If a node can be pebbled more than once, then the problem is PSPACE-complete (and hence NP-Hard), but probably not in NP [10].
As a reminder, the bigger its priority value, the earlier a task is scheduled.
The sole purpose of this reference schedule is to serve as a basis to construct the self-timed schedule. Thus, the makespans resulting from it are not meaningful for our study and, as such, are not represented.
To keep the model simple, interprediction is not considered.

References

Adam, T.L., Chandy, K., Dickson, J.: Comparison of list schedules for parallel processing systems. Commun. ACM 17(12), 685–690 (1974)
Article MATH Google Scholar
Baker, T.P.: Stack-based scheduling for realtime processes. Real-Time Syst. 3(1), 84–88 (1991)
Batat, A., Feitelson, D.: Gang scheduling with memory considerations. In: Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International, pp. 109–114. IEEE (2000)
Benini, L., Flamand, E., Fuin, D., Melpignano, D.: P2012: building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator. In: Design, Automation Test in Europe Conference Exhibition (DATE), 2012, pp. 983–987 (2012)
Buck, J., Lee, E.: Scheduling dynamic dataflow graphs with bounded memory using the token flow model. In: 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-93), vol. 1, pp. 429–432. IEEE (1993)
Canon, L.C., Jeannot, E., Sakellariou, R., Zheng, W.: Comparative evaluation of the robustness of dag scheduling heuristics. In: Gorlatch, S., Fragopoulou, P., Priol, T. (eds.) Grid Computing, pp. 73–84. Springer, US (2008). doi:10.1007/978-0-387-09457-1_7
Chaitin, G.J., Auslander, M.A., Chandra, A.K., Cocke, J., Hopkins, M.E., Markstein, P.W.: Register allocation via coloring. Comput. Lang. 6, 47–57 (1981)
Article Google Scholar
Fradet, P., Girault, A., Poplavkoy, P.: SPDF: a schedulable parametric data-flow MoC. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012, p. 769774 (2012)
Geng, T., et al.: Parallelization of computing-intensive tasks of the h.264 high profile decoding algorithm on a reconfigurable multimedia system. IEICE Trans. Inf. Syst. E93–D(12), 3223–3231 (2010)
Article Google Scholar
Gilbert, J., Lengauer, T., Tarjan, R.: The pebbling problem is complete in polynomial space. SIAM J. Comput. 9(3), 513–524 (1980)
Article MATH MathSciNet Google Scholar
Guermouche, A., L’Excellent, J.Y.: Memory-based scheduling for a parallel multifrontal solver. In: Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, p. 71. IEEE (2004)
Herrmann, J., Marchal, L., Robert, Y.: Model and complexity results for tree traversals on hybrid platforms. In: Euro-Par 2013 Parallel Processing, pp. 647–658. Springer (2013)
Herrmann, J., Marchal, L., Robert, Y.: Memory-aware list scheduling for hybrid platforms. Rapport de recherche RR-8461, INRIA (2014). http://hal.inria.fr/hal-00944336
Jian, G.A., Chu, J.C., Huang, T.Y., Chang, T.C., Guo, J.I.: A system architecture exploration on the configurable HW/SW co-design for h.264 video decoder. In: Circuits and Systems, 2009. ISCAS 2009. IEEE International Symposium on, pp. 2237–2240 (2009). doi:10.1109/ISCAS.2009.5118243
Kwok, Y.K., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31(4), 406–471 (1999). doi:10.1145/344588.344618
Article Google Scholar
Lawler, E.L., Lenstra, J.K., Kan, A.R., Shmoys, D.B.: Sequencing and scheduling: algorithms and complexity. Handb. Oper. Res. Manag. Sci. 4, 445–522 (1993)
Article Google Scholar
Lee, E., Messerschmitt, D.: Synchronous data flow. Proc. IEEE 75(9), 1235–1245 (1987)
Article Google Scholar
Lee, E.A., Ha, S.: Scheduling strategies for multiprocessor real-time DSP. IEEE Global Telecommunications inproceedings and Exhibition 2 (1989)
Liu, J.W.: On the storage requirement in the out-of-core multifrontal method for sparse factorization. ACM Trans. Math. Softw. TOMS 12(3), 249–264 (1986)
Article MATH Google Scholar
Marchal, L., Sinnen, O., Vivien, F.: Scheduling tree-shaped task graphs to minimize memory and makespan. In: Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on, pp. 839–850. IEEE (2013)
Melpignano, D., Benini, L., Flamand, E., Jego, B., Lepley, T., Haugou, G., Clermidy, F., Dutoit, D.: Platform 2012, a many-core computing accelerator for embedded socs: performance evaluation of visual analytics applications. In: Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE, pp. 1137–1142 (2012)
Saponara, S., et al.: Performance and complexity co-evaluation of the advanced video coding standard for cost-effective multimedia communications. EURASIP J. Appl. Signal Process. 2004(2), 220–235 (2004)
Article Google Scholar
Sethi, R.: Complete register allocation problems. SIAM J. Comput. 4(3), 226–248 (1975)
Article MATH MathSciNet Google Scholar
Shi, Z., Dongarra, J.J.: Scheduling workflow applications on processors with different capabilities. Futur. Gener. Comput. Syst. 22(6), 665–675 (2006). doi:10.1016/j.future.2005.11.002
Article Google Scholar
Sih, G.C., Lee, E.A.: Compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Trans. Parallel Distrib. Syst. 4(2), 175–187 (1993)
Article Google Scholar
Sullivan, G., Ohm, J.R.: Recent developments in standardization of high efficiency video coding (HEVC). In: Proceedings of SPIE: The International Society for Optical Engineering, vol. 7798 (2010)
Topcuoglu, H., Hariri, S., Wu, M.Y.: Task scheduling algorithms for heterogeneous processors. In: 8th IEEE Heterogeneous Computing Workshop (HCW’99), pp. 3–14. San Juan, Puerto Rico (1999)
Wang, S.H., et al.: A software-hardware co-implementation of mpeg-4 advanced video coding (AVC) decoder with block level pipelining. J. VLSI Signal Process. Syst. Signal Image Video Technol. 41(1), 93–110 (2005)
Article Google Scholar
Wiegand, T., et al.: Overview of the h.264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 13(7), 560–576 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Inria Bordeaux Sud-Ouest, Talence, France
Paul-Antoine Arras, Emmanuel Jeannot & Samuel Thibault
STMicroelectronics, Grenoble, France
Paul-Antoine Arras, Didier Fuin & Arthur Stoutchinin
University of Bordeaux, Bordeaux, France
Paul-Antoine Arras & Samuel Thibault

Authors

Paul-Antoine Arras
View author publications
You can also search for this author in PubMed Google Scholar
Didier Fuin
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Jeannot
View author publications
You can also search for this author in PubMed Google Scholar
Arthur Stoutchinin
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Thibault
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul-Antoine Arras.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arras, PA., Fuin, D., Jeannot, E. et al. List Scheduling in Embedded Systems Under Memory Constraints. Int J Parallel Prog 43, 1103–1128 (2015). https://doi.org/10.1007/s10766-014-0338-1

Download citation

Received: 03 January 2014
Accepted: 10 October 2014
Published: 12 November 2014
Issue Date: December 2015
DOI: https://doi.org/10.1007/s10766-014-0338-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

List Scheduling in Embedded Systems Under Memory Constraints

Abstract

Access this article

Similar content being viewed by others

Reference frame context-adaptive variable-length coder: a real-time hardware-friendly approach for lossless external memory bandwidth reduction in current video-coding systems

Performance analysis of optimized versatile video coding software decoders on embedded platforms

Low-energy motion estimation memory system with dynamic management

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

List Scheduling in Embedded Systems Under Memory Constraints

Abstract

Access this article

Similar content being viewed by others

Reference frame context-adaptive variable-length coder: a real-time hardware-friendly approach for lossless external memory bandwidth reduction in current video-coding systems

Performance analysis of optimized versatile video coding software decoders on embedded platforms

Low-energy motion estimation memory system with dynamic management

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation