In recent years, there has been an increasing need to develop new implementation techniques and design methodologies for DSP systems. Algorithmic and architectural optimizations are key to developing high-performance signal and information processing systems under strict constraints on implementation complexity and power consumption.

This special issue is composed of a selection of papers reporting on advances in the design and implementation of signal processing systems. The topics range from domain-specific hardware implementation to design methodologies for signal processing algorithm implementations.

In “An energy-efficient Reconfigurable ASIP supporting multi-mode MIMO detection,” (10.1007/s11265-015-0972-x) Ahmad, Li, Amin, Li, Van der Perre, Lauwereins, and Pollin present a programmable ASIP MIMO baseband processor. They first present an efficient modification of the Multi-Tree Selective Spanning Detector algorithm. Then they introduce a soft-output algorithm for generating log-likelihood ratios, called counter-ML bit-flipping. A C-programmable ASIP is designed for 40 nm CMOS, operating at 3.6 Gbps for hard MIMO detection, and 2.05 Gbps for soft detection.

Tripakis, Limaye, Ravindran, Wang, Andrade, and Ghosal consider models of dataflow computation in their paper “Tokens vs. Signals: On Conformance between Formal Models of Dataflow and Hardware” (10.1007/s11265-015-0971-y). They define a formal conformance relation between finite state machines with synchronous semantics and a formal model for dataflow: asynchronous processes communicating via queues. The conformance can provide information in determining the accuracy of hardware models, can be used to highlight timing and synchronization errors, and derive performance properties.

In “A dynamic modulo scheduling with binary translation: Loop optimization with software compatibility,” (10.1007/s11265-015-0974-8) Ferreira, Denver, Pereira, Wong, Lisboa, and Carro propose a binary translation technique for run-time modulo scheduling of loops onto course-grained reconfigurable arrays. The technique eliminates the need to generate an intermediate dataflow graph (DFG) and uses a greedy placement step. The experimental results show that the run-time technique can achieve higher instruction-level parallelism compared to a 16-issue VLIW processor.

Akin, Franchetti, and Hoe present restructured Fast Fourier Transform (FFT) algorithms with efficient memory access patterns in their paper “FFTs with Near-Optimal Memory Access Through Block Data Layouts: Algorithm, Architecture and Design Automation” (10.1007/s11265-015-1018-0). They use a formal representation of the FFT using the Kronecker product to automatically generate hardware implementations of DRAM-optimized FFT algorithms. Results for 1D, 2D, and 3D FFTs show that their designs can achieve close to the theoretical peak performance on several different platforms.

In “Analyzing the Performance-Hardware Trade-off of an ASIP-based SIFT Feature Extraction,” (10.1007/s11265-015-0986-4) Mentzer, Payá-Vayá, and Blume consider the implementation of the Scale-Invariant Feature Transform (SIFT) used in computer vision. This complexity of the SIFT algorithm is too high for real-time implementation on CPUs, so the authors consider implementation on an Application-Specific Instruction-set Processor (ASIP). They develop instruction set extensions for an ASIP and demonstrate 125× speedup compared to the baseline processor.

Yli-Kaakinen and Renfors propose an approach to optimize the fast-convolution (FC) filter banks in their paper “Optimization of Flexible Filter Banks Based on Fast Convolution” (10.1007/s11265-015-1004-6). Since the FC filter banks (FC-FBs) have increased flexibility when compared with the conventional polyphase implementations, multirate filter banks can be implemented efficiently using FC-FBs. In their work, first a subband representation of the FC-FB is derived, then the optimization problems are formulated, and finally these problems solved using a general nonlinear optimization algorithm. Several examples are demonstrate the proposed design scheme as well as to illustrate the efficiency and the flexibility of the resulting FC-FBs.

In “Fast Integer Word-length Optimization for Fixed-point Systems,” (10.1007/s11265-015-0990-8) Nehmeh, Menard, Nogues, Banciu, Michel, and Rocher first introduce new selective simulation technique to accelerate overflow effect analysis, and then propose a new integer word-length optimization algorithm, which exploits this selective simulation technique and reduce both implementation cost and optimization time. Experiments show that the selective simulation technique allows accelerating the execution time of up to 1200 and 1000 when applied on Global Positioning System and on FFT part of orthogonal frequency division multiplexing chain, respectively. Moreover, the proposed optimization algorithm on the FFT part leads to a 17 %–22 % cost reduction with respect to interval arithmetic and an acceleration factor of up to 617 as opposed to classical max-1 algorithm.

Senning, Karakonstantis, and Burg propose a cross-layer optimization to achieve the lowest energy per information bit in their paper “Cross-layer Energy-Efficiency Optimization of Packet Based Wireless MIMO Communication Systems” (10.1007/s11265-015-1003-7). Their cross-layer optimization aims to minimize the energy consumption per information bit based on energy-aware rate adaptation and adjustable physical layer. The proposed energy-aware rate adaptation and modifications to the physical layer improves the energy-efficiency of an IEEE 802.11n system by up to 44 %.

In “Automated Design Flow for Multi-Functional Dataflow-Based Platforms,” (10.1007/s11265-015-1026-0) Sau, Raffo, Palumbo, Casale-Brunet, Bezati, Mattavelli, and Meloni present an integrated design flow to derive optimized multi-functional platforms directly from disjoined high-level specifications. The design flow leverages on an integrated set of independently designed tools, all supporting the RVC standard. Results assessment shows that this approach can yield a reconfigurable design that preserves the original performance of the stand alone non-reconfigurable platform while providing considerable area savings featuring a larger set of functionalities.

Aghababaeetafreshi, Lehtonen, Levanen, Valkama, and Takala present a software-based implementation for the multiple input and multiple output transmitter and receiver baseband processing conforming to the IEEE 802.11 ac standard in their paper, “IEEE 802.11ac MIMO Transceiver Baseband Processing on a VLIW Processor” (10.1007/s11265-015-1032-2). The feasibility of the presented software-based solution is evaluated by studying the number of clock cycles and power consumption of the different scenarios. In comparison with the conventional fixed-function hardware methods, the software defined radio based approaches can potentially offer more flexibility, high energy efficiency, reduced design efforts and thus shorter time-to-market cycles.