Synthesis and Realization of 3-D Orthogonal FIR Filters Using Pipeline Structures

The authors present a novel design algorithm for 3-D orthogonal filters. Both separable and non-separable cases are discussed. In the separable case, the synthesis leads to a cascade connection of 1-D systems. In the latter case, one obtains 2-D systems followed by a 1-D one. Realization techniques for these systems are presented which utilize Givens rotations and delay elements. The results are illustrated by examples of separable and non-separable 3-D system designs, i.e., Gaussian and Laplacian filters.


Introduction
Since the first pulse-code modulation transmission of digitally quantized speech, in World War II, digital signal processing (DSP) began to proliferate to all areas of human life. A classic DSP is based on linear systems described by impulse response functions and transfer functions implemented by structures built from adders, multipliers, and unit delays. Another approach was initiated by [31], known as the state space approach. It was also extended to the 2-D case by Roesser [18], as well as to three-dimensional (3-D) [9]. The steady increase in computational power encourages applying DSP techniques to multidimensional processing. However, the n-dimensional (n-D) DSP development has encountered difficulties caused by n-D polynomials [5]. Namely, B P. Poczekajło pawel.poczekajlo@tu.koszalin.pl R. T. Wirski robert.wirski@tu.koszalin.pl 1 Faculty of Electronics and Computer Science, Koszalin University of Technology, Koszalin, Poland there is no straightforward generalization of the fundamental theorem of algebra to higher dimensions. Classical digital systems are known to possess poor parameters under finite-precision arithmetic, like the sensitivity of the frequency response to changes in the structural parameters, noise, intrinsic oscillations, and limit cycles. These effects have led to the invention of wave filters [6] and orthogonal filters [2,3]. The most common approach to orthogonal filter synthesis is a transfer function decomposition and the state space approach. When it comes to multidimensional DSP, the former technique is of a limited use due to the n-D polynomials. In contrast, the latter provides an opportunity to extend 1-D state space techniques to higher dimensions, thanks to the 2-D, 3-D, and possibly n-D state space equations. The state space approach to lossless systems was initiated by the famous paper [30], where paraunitary matrix synthesis techniques were developed for the 2-D transfer function of a continuous system. The state space approach was also used to develop 2-D orthogonal filter synthesis [16] and simplified to cover a class of separable-denominator orthogonal filters [28] which found to be useful in real-time processing [21].
Nowadays, one can observe that a processed data becomes n-D like video, multichannel audio, machine vision, to name a few. The 3-D processing is especially important in medicine [1] and image/video processing [5,10] but also finds applications in other areas like material structures [12].
DSP synthesis is based on difference equations usually transformed by the Z transform. For the 3-D function f (x 1 , x 2 , x 3 ), this is given by where z 1 , z 2 , and z 3 are complex numbers. Linear time invariant filters are usually classified into recursive and non-recursive. The latter, called finite impulse response (FIR) filters, are very popular due to their simplicity and natural stability. Typically, they are described by a transfer function which is the Z transform of its impulse response. For the 1-D case, it is given by where a 0 , . . . , a n are real constant coefficients. In the 3-D case, the transfer function of an FIR filter extends to In this paper, we deal with a class of orthogonal filters. Introducing the energy of a 3-D real vector function f (x 1 , we define an orthogonal filter to be a system which preserves the energy, i.e., the input energy equals the output energy. Technically, we are about to find a net of Givens rotations which realizes a given transfer function between two ports of that net. The Givens rotation rotates a vector [u s , u t ] T by α radians, i.e., it implements the following set of equations (5) We will denote it by R s,t (α) and use the graphical symbol shown in (5). They are usually implemented using an iterative algorithm called CORDIC [11]. To obtain high throughput, we also utilise permutations in our structures, given by It is known that the following holds: Theorem 1 [7] Transfer function H (z 1 , z 2 , z 3 ) of a 3-D orthogonal filter satisfies The paraunitary condition (7) imposes the single-input single-output system to have a constant, unit frequency response. Typically, orthogonal systems have more inputs and outputs (at least two outputs) to implement practical non-flat frequency responses. During the synthesis process, we use a state space representation [17]. For given which are input, output, and state vector, respectively, it is defined by τ is a real constant partitioned matrix of the form 2-D state space equations, known as the Roesser model, are given by [18] ⎡ ⎣ are the 2-D input, output, and two state vectors in h and v directions, respectively, and can be represented in the following form It is easy to show that if τ is an orthogonal matrix, i.e., τ T τ = I , then the systems described by (9) and (11) are orthogonal filters. During synthesis, this property allow us to focus on the state space matrices orthogonality instead of the energy of signals. We say that a system described by the impulse response r (h, v, d) is separable if it can be represented in the form Otherwise, we call it non-separable. Nowadays, one can observe that most of n-D DSP designs are based on intuitive manipulation applied to a given computational algorithm like ordering multipliers and adders to speed up calculations. Most of the time, no other parameters, except speed and chip area occupation, are taking into account. Unfortunately, when one improves one parameter, another gets worse. This can readily be seen when comparing direct form structures of infinite impulse response 1-D digital filter (fast and inaccurate) and cascade ones (more accurate but output is delayed) for high orders. Similar problems are observed in active electronic filters compared to passive ones made of inductors, capacitors, and resistors, as well as in classic 1-D DSP direct form structures versus wave and orthogonal filters. There are several other parameters influencing the design performance like the sensitivity of the frequency response to changes in the structural parameters, noise, intrinsic oscillations, and limit cycles. So, the idea which motivated the authors in this paper is that one can design better n-D systems by incorporating the techniques which helped in 1-D domain like the orthogonal filters technique. A novelty of our approach is that we can design orthogonal 3-D filters consisting of Givens rotations. To the best of the authors' knowledge, no other techniques to synthesize 3-D rotation structures have yet been published. The scope of the paper is to present details of the synthesis algorithms for separable and non-separable systems. The main idea of our approach is to decompose a 3-D systems into a connection of lower dimension blocks and then apply previously elaborated, 1-D and 2-D orthogonal synthesis procedures [24,26].
The paper is organized as follows. In Sect. 2.1, we present details of the synthesis algorithm of separable orthogonal 3-D filters. Next, an example which illustrates that method is presented in Sect. 2.2 (3-D Gaussian filter). In Sect. 3.1, an expanded version of the technique for the non-separable case is proposed followed by its application in Sect. 3.2 (3-D Laplace filter). In the examples included in this paper, we utilize the standard sample by sample ordering which converts the 3-D signal into 2-D images processed image by image.

Synthesis Algorithm
For a separable 3-D system, we represent its transfer function (3) in the form Applying the 1-D synthesis technique, presented in [25], to , and H d (z d ) for which we obtain three 1-D state space realizations spanned on a 3-D domain: We also get scaling factors k h , k v and k d , which are the maxima of the frequency response squared for H h (z h ), H v (z v ), and H d (z d ), respectively. Then, each state space realization in (15) is implemented using pipeline structures H h , H v , and H d , respectively. Constructing a cascade connection of these three implementations, we obtain the separable 3-D orthogonal pipeline structure shown in Fig. 1.

Realization Example of a Gaussian Filter
Let us give a design example of a separable 3-D FIR Filter. We have chosen a Gaussian filter whose impulse response is given by Substituting (16) into (13), we have where By (14), we obtain where The constant n 0 is chosen to shift the origin of (19). The parameters for the filter have been taken from [1], i.e., σ = 0.7, n 0 = 2, and h, v, d = −2, −1, 0, 1, 2. Substituting them into (21), we obtain kernel coefficients of the Gaussian filter which are presented in Table 1. From (20) we see that it suffice to design a state space system by the technique presented in [25] applied to T (z), which is given by The synthesis algorithm is as follows. We construct a paraunitary transfer vector: where and k is a constant scalar chosen to make the factorization possible. We have chosen k = k h = k v = k d = 1.4547750 which is the maximum of the frequency response of |T (e jω )| 2 . Next, we represent (23) in the form: where: By applying QR to For (23), we determine the state space representation (9), We apply similarity transformation to (27) starting from Schur decomposition of A (to find upper triangular representation) which changes also B and C which leads to a new matrices A U ,B U , and C U Next, we extend A UBU C UD to be a square matrix [27,29] which leads to final state space realization, given by Applying the Givens decomposition and permutations to (29), we obtain the pipeline rotation structure shown in Fig. 2 whose parameters are presented in Table 2 (detailed algorithm of the decomposition is presented in [26]). To apply the obtained 3-D Gaussian filter to image processing, we need to link each independent variable h, v, and d with the dimensions of an image. To keep things simple, we assign h, v, and d to be indexes of the 3-D image in the horizontal (rows), vertical (columns), and depth (frames) dimensions, respectively. Suppose that an image frame is of size H × V . We assume that it is processed sample by sample   Fig. 2, need to be replaced with the following 1-D delays: z −1 , z −H , and z −H V , respectively. The 3-D Gaussian filter has been modelled in the Scilab environment [22]. The impulse response of the filter has been simulated for the 8 × 8 × 8 3-D Kronecker delta matrix input. Obtained results are similar to shown in Table 1 with mean-squared error which equals to 6.346 · 10 −23 . The filter has been tested with a real 3-D medical DICOM image taken from [4]. For this task, the authors have implemented Scilab procedures to read and write DICOM files.

Synthesis Algorithm
Suppose we are given a non-separable transfer function (3). To obtain a rotation structure in this case, we rearrange (3) into one of the following form: From (30), we have three ways to decompose the system into a cascade connection of 2-D and 1-D blocks. We have chosen (30a); the other two structures can be obtained in a similar way. The coefficients H j (z h , z d ) can be represented in the so-called Gram form where and Introducing the partitioned matrix we can represent (30a) in the form where Applying a full rank factorization [8] to (33), we obtain where C d and C e are real full rank (l + 1) × r and r × (n + 1) matrices, respectively. Substituting (36) into (34), we obtain and Each H e i (z h , z v ) is an single-input single-output system, so we can apply the algorithm presented in [27,28] to synthesize 2-D orthogonal state space equations, given by (11), spanned on 3-D. The factors k e i obtained from the 2-D synthesis are applied to the corresponding elements ofĤ d (z d ). The resulting vector will be denoted by Thanks to it, we remove r scaling factors from the rotation structure. As (40) is a horizontal vector, it cannot satisfy (7). So, to obtain an orthogonal structure, we take into account a transposed version of (40) [17, p. 546]. So, we can apply the algorithm illustrated in Sect. 2.2 to H T d (z d ) obtaining (10) which is a square matrix [27]. Then, the result is transposed back to the orthogonal realization of (40), which is 1-D state space equations (9) spanned on 3-D (dimensions h and v are not processed), given by where In (37)

Realization Example of a 3-D Laplace Filter
Let us design the 3-D orthogonal Laplace filter whose mask is given by [12] The transfer function for the system described by (43) is where and Applying a full rank factorization to (45), we obtain (37), where r = 2, and Let us focus on H e 1 (z h , z v ) realization, given by (48), for a moment. We represent H e 1 (z h , z v ) in the form: Applying the full rank decomposition to (50), we obtain: where For (52), we construct the paraunitary systems: We obtain 1-D state space realizations of (53) [in similar way as (24)- (28) in Sect. 2.2]: We add new columns toB t andD t , obtaining square matrix A t B t C t D t [29]: We extend the number of columns in (60): for which we can also get the pipeline rotation structure. In a similar way as in Sect.  Fig. 3. Replacing H e 1 , H e 2 , and H d with their orthogonal counterparts, we  4 Pipeline implementation of the 3-D orthogonal Laplace filter obtain the pipeline structure shown in Fig. 4. Each H e 1 , H e 2 , and H d are implemented using Givens rotations and permutations whose parameters are presented in Tables 4,  5, and 6. As they are multi-input multi-output orthogonal systems, there are extra inputs which will be set to zero as well as additional outputs will not be used. Hence, the first three rotations R 1 , R 2 , and R 3 , and the −1 multiplier in the implementation of H e 1 and rotations R 19 , R 20 , and R 21 , and the −1 multiplier in the implementation of H e 2 , have input and output constantly equal to zero. So, they can be removed from the structure. The final structure for the 3-D orthogonal Laplace filter is shown in Fig. 5.
If the H e i systems have unequal numbers of rotations, we apply extra delay elements to compensate for the processing time of all the H e i blocks.
The impulse response of the filter has been simulated for the 8×8×8 3-D Kronecker delta matrix input. The results are presented in Table 3.

Conclusions
The main contribution of the paper is the extension of 1-D and 2-D FIR orthogonal filters to 3-D case. By doing so, we open up new possibilities to take into account parameters which are usually omitted in 3-D designs like sensitivity of the frequency response to changes in structural parameters, noise, intrinsic oscillations, and limit cycles. This is the main difference from 3-D FIR techniques, known in literature, which usually focus on speed and chip area only. However, presented results are occupied by much higher mathematical burden during synthesis which calls for polynomial factorization, matrix pseudoinverse, QR decomposition, full rank matrix factorization, real Schur decomposition, and orthonormal basis extension. Nonetheless, these are standard numerical methods which can be implemented using popular mathematical software. The authors have applied Scilab [22] for these tasks. The frequency response of a separable system is limited to a superposition of, possibly different, 1-D functions applied to each direction independently. Due to that, a separable system cannot fully exploit a neighbourhood of a processed sample. On the other hand, their synthesis techniques in n-D are as simple as 1-D approaches applied n times. In the non-separable case, we are allowed to design systems which approximate any frequency response at a cost of much more complicated synthesis algorithms. The originality of our non-separable technique, presented in Sect. 3.1, is that we represent a 3-D system in the Gram form with coefficients collected in a matrix which is a subject of a full rank factorization. Thanks to it, we separate one variable from the system at the expense of an increase in the subsystem's inputs and outputs. As a result, we get a cascade connection of 2-D systems and a 1-D one. However, obtained structures are clearly 3-D, which means that their input and output are 3-D functions. So, to use them in real applications we need to apply any concurrent technique or sample by sample ordering to the signals and systems. The pipeline structures, obtained in Sects. 2.1 and 3.1, have high throughput at the expense of a latency which is not larger then (m 1 (l + 1) + m 2 (n + 1)(m + 1))τ R and (m 1 (l +1)+m 2 (n +1)(m +1))τ R for separable and non-separable cases, respectively, where τ R denotes the processing time of a Givens rotation and m 1 ≤ min{l + 1, (n + 1) · (m + 1)}, m 2 ≤ min{n + 1, m + 1}, for m,n, and l given by (30). So, they are well suited for hardware real-time 3-D image processing, but also may find applications in other areas where 3-D data are employed like in economy modelling, seismic data, weather forecast models, etc. It is also possible to utilise presented results as a digital replacement for analog controllers in control systems [13][14][15]19,20]. Due to the robust orthogonal systems properties, they will be especially applicable when high processing precision is required even for low bit quantization, as in medical imaging. One can give upper bounds for the number of Givens rotations and single delay elements. In the separable case, they are given by n + m + l + 3 and n + m H + l H V , respectively, for an H × V image frames, where m, n, and l are defined in (14). For the same image in the non-separable case, a structure has no more then m 1 (l + 1) + m 1 m 2 (n + 1)(m + 1) Givens rotations and m 1 (n + m H) + l H V single delay elements, for m,n, and l given by (30).
The 5 × 5 × 5 3-D Gaussian filter, presented in Sect. 2.2, has been implemented in DE2 development board with Cyclone II chip (EP2C35F672C6) running at 125 MHz clock rate [23]. In [10], a similar filter of order 3 × 3 × 3 has been presented using the same DE2 board of 100 MHz rate for RAM communication and 25 MHz for the filter module. The reported overall performance was 30 frames/s for 640 × 480 images. In [1], a 3-D anisotropic diffusion filter has been proposed which is a 5 × 5 × 5 3-D Gaussian filter. It was implemented in Stratix II chip (EP2S180F1508C4) using 7524 ALUTs and 20 DSP multipliers. Their design achieves voxel processing rate of 192-194 MHz for the system clocked at 200 MHz. An obtained precision was measured with a mean-squared error which is 0.030 for 8-bit fixed number representation.
The authors' filter, presented in [23], occupies 12020 LUTs (no DSP blocks are utilized) and achieves maximum voxel processing rate of 125 MHz which leads to 409 frames/s for 640 × 480 images. The mean-squared error for 8-bit fixed number representation is 0.002. Comparing obtained results one can see that the proposed filter voxel processing rate is at the clock rate. This is a natural property of fully pipeline structures. The number of ALUTs occupied by the proposed filter is greater than that in [1] and probably follows from the CORDIC implementation; however, we do not utilize extra DSP blocks. The proposed filter possesses good precision, due to Givens rotations, manifested by the low mean-squared error and other good properties. For more details the reader is referred to [23].