Employing reconfigurable devices that can adapt—even at runtime—the structure of the underlying hardware architecture to the specific requirements of a given application can lead to tremendous boosts in performance and energy efficiency compared to conventional architectures [1]. The articles in this special issue feature different types of reconfigurable devices, including field-programmable gate arrays (FPGAs), coarse-grained reconfigurable arrays (CGRAs), and configurable heterogeneous platforms. The application areas are manifold, including image, video, and other digital signal processing, scientific computing, or near-data processing for databases, to name only a few, which are also considered in this special issue.

This special issue of Springer’s Journal of Signal Processing Systems covers various facets of the abovementioned topics. The special issue is based on extended versions of selected top-level papers presented at the 17th International Symposium on Applied Reconfigurable Computing (ARC). The 2021 edition of ARC [2] was hosted by the Université de Rennes 1 and Inria, France, and took place as a virtual conference during June 29–30, 2021. From 36 submitted papers, 14 regular papers had been presented at ARC 2021, and after a careful peer-review process, four extended manuscripts were accepted for inclusion in this special issue. It is our pleasure to introduce these articles in the following briefly.

The initial two articles of this special issue deal with configurable heterogeneous systems.

Heinz and Koch’s article “On-Chip and Distributed Dynamic Parallelism for Task-based Hardware Accelerators” [3] presents TaPaSCo, an open-source hardware/software framework for FPGA-based architectures. The framework provides a task-based programming model that allows the integration of FPGA-based accelerators into a heterogeneous platform, including the scheduling of communication between the host and the accelerator system. TaPaSCo features the distribution of tasks across multiple FPGAs as well as the scheduling and dispatching of tasks in hardware, i.e., on the FPGA chip. The hardware-assisted scheduler can speed up the launching of tasks by a factor of 35 compared with previous approaches.

The second article, entitled “Integrating Energy-Optimizing Scheduling of Moldable Streaming Tasks with Design Space Exploration for Multiple Core Types on Configurable Platforms” by Keller et al. [4], deals with the mapping and scheduling of task graphs onto configurable heterogeneous platforms. The proposed approach combines design space exploration with static scheduling of streaming applications. The primary design objective is energy minimization for a given throughput constraint. The scheduling problem is formulated by an integer linear program (ILP), which incorporates the geometry (size) of the different cores as a further constraint. Among other things, the placement of different cores types (little and big cores) on an FPGA is solved theoretically using the proposed approach.

The following two articles deal with application placement and mapping onto FPGAs and CGRAs, respectively.

The article with the title “Energy Efficient Hardware Loop Based Optimization for CGRAs,” by Sunny et al. [5], deals with the efficient parallelization and mapping of nested loop programs onto CGRAs. Well-known software techniques, such as (partial) loop unrolling, are combined with a centralized hardware loop block (HLB) compared to one HLB per processing element in previous approaches. The HLB keeps track of loop iterations and bounds and thus provides zero-overhead looping. Loop unrolling combined with the centralized HLB approach could achieve a significant reduction in the number of executed instructions and energy compared to previous work by the authors.

Pfau, Zaki, and Becker provide the last article, “V-FPGAs: Increasing Performance with Manual Placement, Timing Extraction and Extended Timing Modeling” [6]. V-FPGAs stand for virtual FPGAs and provide a vendor-independent virtualization layer. The authors propose strategies to increase the uniformity of a placement, and thus achieve also higher clock frequencies. The proposed concepts are implemented using the open-source versatile place and route (VPR) tool. Further, a framework for automated timing extraction is provided, which enables the characterization of a specific V-FPGA design. Finally, the approach is compared with strategies employed in Xilinx Vivado.

We are very grateful to the editor-in-chief, Sun-Yuan Kung, and the co-editors-in-chief, Shuvra S. Bhattacharyya and Jarmo Takala, of Springer’s Journal of Signal Processing Systems, for their responsive and pleasant communication. We also acknowledge the administrative staff for their valuable support throughout the preparation and publication of this special issue. Furthermore, we thank all authors for their contributions to this special issue and excellent efforts. Finally, we also thank all the reviewers for their careful work and valuable suggestions that helped improve the quality of the articles.

We hope you will enjoy reading this special issue.