Abstract
Parallelization of Digital Signal Processing (DSP) software is an important trend in Multiprocessor System-on-Chip (MPSoC) implementation. The performance of DSP systems composed of parallelized computations depends on the scheduling technique, which must in general allocate computation and communication resources for competing tasks, and ensure that data dependencies are satisfied. In this paper, we formulate a new type of parallel task scheduling problem called Parallel Actor Scheduling (PAS) for MPSoC mapping of DSP systems that are represented as Synchronous Dataflow (SDF) graphs. In contrast to traditional SDF-based scheduling techniques, which focus on exploiting graph level (inter-actor) parallelism, the PAS problem targets the integrated exploitation of both intra- and inter-actor parallelism for platforms in which individual actors can be parallelized across multiple processing units. We first address a special case of the PAS problem in which all of the actors in the DSP application or subsystem being optimized are parallel actors (i.e., they can be parallelized to exploit multiple cores). For this special case, we develop and experimentally evaluate a two-phase scheduling framework with three work flows that involve particle swarm optimization (PSO) — PSO with a mixed integer programming formulation, PSO with simulated annealing, and PSO with a fast heuristic based on list scheduling. Then, we extend our scheduling framework to support the general PAS problem, which considers both parallel actors and sequential actors (actors that cannot be parallelized) in an integrated manner. We demonstrate that our PAS-targeted scheduling framework provides a useful range of trade-offs between synthesis time requirements and the quality of the derived solutions. We also demonstrate the performance of our scheduling framework from two aspects: simulations on a diverse set of randomly generated SDF graphs, and implementations of an image processing application and a software defined radio benchmark on a state-of-the-art multicore DSP platform.
Similar content being viewed by others
References
Blossom, E. (2004). GNU radio: tools for exploring the radio frequency spectrum. Linux Journal, 2004(122), 4.
Dagum, L., & Menon, R. (1998). OpenMP: an industry standard API for shared-memory programming . IEEE Computational Science & Engineering, 5(1), 46–55.
De Micheli, G. (1994). Synthesis and Optimization of Digital Circuits. New York: McGraw-Hill.
Dogramaci, A., & Surkis, J. (1979). Evaluation of a heuristic for scheduling independent jobs on parallel identical processors. Management Science, 25(12), 1208–1216.
Du, J., & Leung, J.Y. (1989). Complexity of scheduling parallel task systems. SIAM Journal of Discrete Mathematics, 2(4), 473–487.
El-Rewini, H., Lewis, T.G. , Ali, H.H. (1994). Task Scheduling in Parallel and Distributed Systems. Englewood Cliffs: Prentice Hall.
Falk, J., Zebelein, C., Haubelt, C., Teich, J. (2013). A rule-based quasi-static scheduling approach for static islands in dynamic dataflow graphs, 12(3).
Gepner, P., & Kowalik, M.F. (2006). Multi-core processors: New way to achieve high system performance. In: Proceedings of the International Symposium on Parallel Computing in Electrical Engineering, pp. 9–13.
Giaro, K., Kubale, M., Obszarski, P. (2009). A graph coloring approach to scheduling of multiprocessor tasks on dedicated machines with availability constraints. Discrete Applied Mathematics, 157(17), 3625–3630.
Hsu, C., Ko, M., Bhattacharyya, S.S. (2005). Software synthesis from the dataflow interchange format. In: Proceedings of the International Workshop on Software and Compilers for Embedded Systems, (pp. 37–49). Texas: Dallas.
Jansen, K., & Porkolab, L. (2000). Preemptive parallel task scheduling in O(n) + Poly(m) time. In: G. Goos, J. Hartmanis, J. Leeuwen, D.T. Lee, S. Teng (Eds.) Algorithms and Computation, Lecture Notes in Computer Science, (pp. 398–409). Berlin Heidelberg: Springer.
Kasim, H., Marchu, V., Zhang, R., See, S. (2008). Survey on parallel programming model. In: J. Cao, M. Li , M. Wu , J.Chen (Eds.) Network and Parallel Computing, Lecture Notes in Computer Science, vol. 5245, (pp. 266–275). Berlin Heidelberg: Springer.
Kennedy, J., & Eberhart, R.C. (1995). Particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks, (pp. 1942–1948).
Lee, E.A., & Messerschmitt, D.G. (1987). Synchronous dataflow. Proceedings of the IEEE, 75(9), 1235–1245.
Lee, E.A., & Parks, T.M. (1995). Dataflow process networks. Proceedings of the IEEE, 773–799.
Lucke, L.E., Brown, A.P., Parhi, K.K. (1991). Unfolding and retiming for high-level dsp synthesis. In: Proceedings of the International Symposium on Circuits and Systems, (pp. 2351–2354).
Manaa, A., & Chu, C. (2010). Scheduling multiprocessor tasks to minimise the makespan on two dedicated processors. European Journal of Industrial Engineering, 4(3), 265–279.
Murata, H., Fujiyoshi, K., Nakatake, S., Kajitani, Y. (1996). Vlsi module placement based on rectangle-packing by the sequence-pair. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 15(12), 1518–1524.
Nahapetian, A., Brisk, P., Ghiasi, S., Sarrafzadeh, M. (2009). An approximation algorithm for scheduling on heterogeneous reconfigurable resources. ACM Transactions on Embedded Computing Systems, 9(1).
Nichols, B., Buttlar, D., Farrell, J.P. (1996). Pthreads Programming: A POSIX Standard for Better Multiprocessing. Chicago: O’Reilly & Associates, Inc.
Omara, F.A. , & Arafa, M.M. (2010). Genetic algorithms for task scheduling problem . Journal of Parallel and Distributed Computing, 70 (1), 13–22.
Pelcat, M., Menuet, P., Aridhi, S., Nezan, J.F. (2009). Scalable compile-time scheduler for multi-core architectures. In: Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, (pp. 1552–1555).
Plishker, W., Sane, N., Bhattacharyya, S.S. (2009). A generalized scheduling approach for dynamic dataflow applications. In: Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, (pp. 111–116). France: Nice.
Rutenbar, R.A. (1989). Simulated annealing algorithms: an overview. IEEE Circuits and Devices Magazine, 5(1), 19–26.
Sesia, S., Toufik, I., Baker, M. (2011). LTE — The UMTS Long Term Evolution: From Theory to Practice. New York: Wiley.
Shen, C, Wu, H., Sane, N. , Plishker, W. , Bhattacharyya, S.S. (2011). A design tool for efficient mapping of multimedia applications onto heterogeneous platforms.. In: Proceedings of the IEEE International Conference on Multimedia and Expo. Barcelona, Spain. 6 pages in online proceedings.
Sriram, S., & Bhattacharyya, S.S. (2009). Embedded Multiprocessors: Scheduling and Synchronization, 2nd edn. Boca Raton,: CRC Press. ISBN:1420048015.
Sullivan, G.J., Ohm, J., Han, W., Wiegand, T. (2012). Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology, 22(12), 1649–1668.
Texas Instruments, Inc. (2012). In: TMS320C6678 Multicore Fixed and Floating-Point Digital Signal Processor Data Manual .
Wu, M., & Gajski, D. (1990). Hypertool: a programming aid for message-passing systems. IEEE Transactions on Parallel and Distributed Systems, 1(3), 330–343.
Wu, S. (2011). Representation and scheduling of scalable dataflow graph topologies. Master’s thesis, Department of Electrical and Computer Engineering, University of Maryland, College Park.
Young, E.F.Y., Chu, C.C.N., Ho, M.L. (2004). Placement constraints in floorplan design. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 12(7), 735–745.
Zaki, G. (2013). Scalable techniques for scheduling and mapping DSP applications onto embedded multiprocessor platforms. Ph.D. thesis, Department of Electrical and Computer Engineering, University of Maryland, College Park.
Zaki, G., Plishker, W., Bhattacharyya, S., Clancy, C., Kuykendall, J. (2011). Vectorization and mapping of software defined radio applications on heterogeneous multi-processor platforms. In: Proceedings of the IEEE Workshop on Signal Processing Systems, (pp. 31–36). Lebanon: Beirut .
Zhou, Z., Desnos, K., Pelcat, M., Nezan, J., Plishker, W., Bhattacharyya, S.S. (2013). Scheduling of parallelized synchronous dataflow actors, (pp. 1–10). Finland: Tampere. URL http://ieeexplore.ieee.org.
Zhou, Z., Shen, C., Plishker, W., Wu, H., Bhattacharyya, S.S. (2012). Systematic integration of flowgraph- and module-level parallelism in implementation of DSP applications on multiprocessor systems-on-chip. In: Proceedings of the International Conference on Signal Processing. pp. 402–408. China: Beijing. URL http://www.ece.umd.edu/DSPCAD/papers/zhou2012x1.pdf.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhou, Z., Plishker, W., Bhattacharyya, S.S. et al. Scheduling of Parallelized Synchronous Dataflow Actors for Multicore Signal Processing. J Sign Process Syst 83, 309–328 (2016). https://doi.org/10.1007/s11265-014-0956-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-014-0956-2