Optimally Factored IFIR Filters
 286 Downloads
Abstract
This paper presents a new design method and a corresponding architecture for creating FIR filters that are significantly more hardwareefficient than presently known implementations. These optimally factored IFIR filters are also easily pipelined, thereby allowing operation at much higher datarates. Using examples introduced by previous researchers, we show surprisingly better hardware efficiency. Two such examples show hardware reductions in the vicinity of 50%, relative to the conventional Remez structures, whereas previous research targeting this matter reports more modest results. We also show new features and further benefits that can be obtained by using optimally factored IFIR filters.
Keywords
FIR filter Factored filter IFIR Interpolated FIR Optimal FIR filter Cascade filter Optimal filter design Filter hardware complexity1 Introduction
An interpolated FIR digital filter (often referred to as an “IFIR filter”) is well known to filter designers [6, 7, 11, 17, 19]. It uses a filter architecture that can be very efficient for making narrowband lowpass (or, by simple mapping, highpass) filters. The IFIR transfer function H(z) is constructed as a cascade connection of two FIR filters H(z) = G(z^{L})I(z) where the ntap FIR filter G(z) (often called the model filter) has its argument z replaced by z^{L} for a positive integer L (called the “stretch factor,” subsequently referred to as “SF”), and this replacement is equivalent to “stretching” the length of filter G to become approximately L times as long—more precisely, it will have 1 + (n − 1)L taps, with the majority of the tap coefficients having the value zero, hence having zero hardware cost for such tapcoefficient multipliers and their structural adders. This stretching in the time domain is equivalent to “shrinking” the transfer function G(e^{jω}) by the factor L in the frequency domain which gives insight as to why such functions can be efficient when used for narrowband filters. Such frequencydomain shrinking, however, causes unwanted passbands, centered at ω = 2π/L, 4π/L, …, 2π(L − 1)/L, to appear and these must be removed (or masked) by using the cheap (due to its wider transition band) lowpass filter I(z), called the interpolator or masking filter. Refs. [6, 17, 19] provide more details on IFIR filters and their properties, and [7, 11] show how to choose an optimum stretch factor (SF) so that a filter will most efficiently meet given passband and stopband specifications.
For additional details on our “optimal factoring algorithm,” we recommend [12, 14]. We now, however, begin an extension of this work to IFIR filter implementations.
Hardwarecomplexity summary and comparison (order15 Example 1) parameter B is the data width (wordlength)
1.1 In Regard to Optimal Factoring Filter Design
In the Fig. 4 factored filter, this yields the three fourthorder stages and it leaves the 180° zero and the 87° zeropair to stand alone as firstorder and secondorder blocks. Using a similar approach, we now present a method for making optimally factored IFIR filters that has not previously been explored.
Table 1, line 2, gives our assessment of a 10.1% hardware savings to be expected by this Fig. 4 nonIFIR factoring example. This may seem a relatively minor improvement. However, we shall see that this small (degree15) filter, which does not display many features that would identify it as a particularly good candidate for IFIR implementation (for example, it does not have a very narrow transition band) still achieves a rather impressive 25% hardware reduction once we combine the optimal factoring with the use of the IFIR architecture (as is shown in the last row of Table 1). For larger and more demanding filters, we have found, and will demonstrate herein, that even greater percentagereductions in hardware can be expected. Before discussing this IFIR factoring further, we shall briefly explain our Table 1 computations. (Similar hardwareassessment techniques will be used throughout our subsequent discussions.)
1.2 Concerning Our Assessments of Filter Hardware Costs
In our subsequent discussions on IFIR transfer functions of the form G(z^{L}), the presence of stretch factors L > 1 will cause us to consider FIR filter structures having a cascade of numerous z^{−1} delays as alternatives for structures having fewer delays but more numerous (and often more expensive) tapcoefficient multipliers. We consider all FIR filters discussed here to have fixedpoint binary multiplier coefficient values, implemented efficiently by circuits that employ hardwired data shifts and additions (multiplier adders) of this shifted data. This is true (and commonplace) for directform as well as transposedform FIR filter hardware implementations.
The “hardware efficiency” of a circuit will be affected by the number of additions required, which we assess in terms of the number of “multiplier adders” (MA) used. Also, other adders, socalled “structural adders” (SA), will be required to implement, for example, “plus or minus” operations like those shown within the boxes comprising the cascade structure at the top of Fig. 4 or, more generally, the additions for combining data that would take place in a conventional directform or a conventional transposedform FIR filter. Ultimately this “adder hardware” will be assessed as the total number of singlebit “full adders” required to build these multiplier adders and the structural adders. And in doing this for our various examples we also account for the type of simplifications that are routinely employed by one skilled in the art, such as subexpression sharing.
In Sect. 2, we now use the small order15 FIR filter, whose transferfunction magnitude H(e^{jω}) is plotted in Fig. 1 (dashed line), to illustrate some of the basic concepts for our new optimally factored IFIR filter design and implementation.
Notice that the important new concept of joint (versus individual) sequencing of the two sets of model filter and interpolator filter stages will also be introduced. The resulting filter structures are compared with the noninterpolated optimally factored (Fig. 4) design and with the conventional Remez implementation.
Additional techniques and benefits are also presented in Sects. 3 and 4 by examining the optimally factored IFIR design of two highly cited highorder filters.
2 Degree15 Filter Example: Choice of Stretch Factor and New Joint StageSequencing Technique
Example 1
Following [11], the choice of an optimum stretch factor for this filter is obtained and, as discussed previously, the two choices, indicated in Fig. 5, are SF = 2 and SF = 3, where, as summarized in Table 1, SF = 3 is the better choice due to its slightly greater hardware efficiency. Notice that the exact hardware costs will ultimately involve the details of any specific implementation.
Table 1 gives the hardwarecomplexity comparison for the optimally factored IFIR filters versus the conventional Remez (directform) FIR filter, as well as the (nonIFIR) optimally factored filter, and the IFIR nonfactored filters. Clearly, the optimally factored SF = 3 IFIR filter has the fewest adders and lowest total complexity. Due to the modest 22dB stopband attenuation target in Example 1, the wordlength of the signal path can require as few as just six bits for the optimally factored IFIR cascade implementations.
3 An Order59 Filter Example [18]: Factored IFIR Efficiency, and Additional Benefits
Example 2
Hardwarecomplexity comparison for order59 Example 2 FIR filter (B represents the wordlength of the datapath)
Benefit: If desired, the optimally factored IFIR filter easily allows a nonuniform datapath wordlength across the stages of the cascade. This can efficiently deliver better noise performance, as the dynamic range of each stage output can easily be optimally and independently adjusted, as will be discussed in Sect. 4.
Parameter B in Table 2 is the datapath wordlength, which should be at least 12 bits (including the sign bit) to allow a singlestage conventional design to provide enough resolution to be able to realize a 60dB attenuation of the incoming signal. For the Fig. 15 factored IFIR filter, as discussed earlier, a wordlength of 14 bits is required. Table 2 also provides complexity comparisons of the FIRGAM method [2], the original CSD implementation of this example filter [18] (which, being an early CSD filter, was focused on reducing adder costs only), the PMILP algorithm [25], the minimumadder MILP [21], the cascade method [22], and the genetic algorithm cascade [26].
Comparisons: Area, Speed, and Power Consumption
The fully pipelined optimally factored structure has fourteen pipelining registers (one register at the output of each of the first fourteen stages in Fig. 15). Table 3 shows that when the factored IFIR filter is not pipelined it has the smallest area, but the longest critical path and, as stated earlier, it is suitable only for applications where high speed is not required. Due to its long critical path, the synthesis tool had to increase its logic gate sizes in order to operate at 100 MHz, resulting in slightly higher power consumption than the pipelined designs. Its maximum operating speed was then 160 MHz. In contrast, the fully pipelined optimally factored designs had the shortest critical paths and hence the synthesis tool was able to achieve very high sampling rates using mostly small gate cells. While the transposed design and the fully pipelined factored design can both reach, at most, a speed of 900 MHz, notice that the transposed design requires a considerable increase in gate sizes (hence, considerable increases in area and power) in order to operate at this speed.
Notice that the conventional transposed design’s area and power requirements at 900 MHz are, respectively, 3.5 times and 53% higher than those of the optimally factored IFIR filter.
Also the conventional directform filter can operate only at speeds up to 500 MHz and even at that relatively low speed, it consumes a 2.3 times larger area and 28% more power than the fully pipelined Fig. 15 optimally factored IFIR filter.

Test 1) The input signal is white Gaussian noise (uniform power across all frequencies). We expect the filter to attenuate by 60 dB the portion of the signal within the stopband.

Test 2) The input signal is colored Gaussian noise with uniform power within the stopband. It is a sum of 100 randomphase sinusoids, uniformly distributed across the stopband (ω ≥ 0.14π). We expect a 60dB attenuation of the entire signal.

Test 3) The input signal is one sinusoid at the passband edge.

Test 4) The input signal is one sinusoid at the stopband edge.
Figures 18 and 19 show that the Fig. 15 optimally factored IFIR filter is able to fully attenuate (by at least 60 dB) the stopband portions of the input signal (including a sinusoid at the edge of the stopband) and it is able to perfectly pass the passband signals (including a sinusoid at passband edge) with negligible (less than 0.1dB) attenuation.
Figure 19 shows the progress of the RMS stage outputs throughout the cascade for the two sinusoidal test cases at the passband and stopband edges (Test 3 and Test 4).
An AdditionalBenefitprovided by the inherent flexibility of the optimally factored IFIR filter inFig. 15 :
If a (very modest) 0.019dB increase is allowed in the passband ripple (i.e., changing from ± 0.1035 to ± 0.1225 dB), then the 8thstage [1 − 0.46875z^{−3} + z^{−6}] in the Fig. 15 structure can be further simplified to become [1 − 0.5z^{−3} + z^{−6}], while the rest of the cascade factors can remain intact. The resulting modified stage has only trivial coefficients, which yields a further reduction in the shiftadd operations necessary for implementing the Fig. 15 filter coefficients (a reduction by 10% from ten down to nine multiplier adders). The importance of this observation is that:
In general, we have found that, given a minor (usually acceptable) allowance in some of the target filter specifications, it is oftenpossible to exploit it to further simplify a specific stage (or stages!) of the optimally factored IFIR filter. In particular, this can be donewithout the need to change any of the other stagesin order to reduce the filter’s overall hardware complexity.
4 A HardwareEfficient Wideband Filter Design Via Optimally Factored IFIR Implementation: Order62 Filter Example from [2, 8, 27]
Example 3
Similar to the order59 filter in Sect. 3, this filter, referred to as filter L2 in [2], is a convenient example because several previous publications [2, 8, 18, 21, 22, 25, 27] have chosen to use it when presenting their own filter design and implementation methods. These include the FIRGAM and Remez algorithms [2], an algorithm (LIM) from [8], the Partial MixedInteger Linear Programming (PMILP) algorithm of [25] and the singlestage and dualstage designs using the coefficient optimization algorithms in [21, 22].
We first demonstrate an optimally factored IFIR implementation of filter L2, and we compare its complexity with the abovecited designs. Our filter implementation will also provide the opportunity to demonstrate: ANOTHER BENEFIT of our optimally factored filters: i.e., due to the relatively small size of our FIR factors, it is often possible to find some (otherwise not particularly obvious) opportunities to further reduce the number of add operations required for implementing some FIR coefficients.
Model filter G(z): quantized stages and binary representations, L2 filter using optimal pairing identified in Fig. 21a
Interpolator filter I(z): quantized stages and binary representations, L2 filter using optimal pairing identified in Fig. 21a
The binary values of coefficients for G(z) and I(z), listed in Tables 4 and 5, indicate that most factors can be implemented very cheaply. Indeed, only Factor 5 and Factor 6 (the two largest factors) have coefficients that require more than one MA (multiplier adder) in their implementation. The Appendix explains how we can implement each of these factors with just two MA. (Admittedly, we do somewhat blur the distinction between MA and SA: we increase the number of SA.) Overall, however, we achieve a net reduction of one addition for each factor: we need 2 MA and 8 SA for Factor 5, and the same for Factor 6.
Hardwarecomplexity comparison for order62 wideband FIR filter (B represents the wordlength of the datapath)

Test 1) Input signal is an ensemble of 50 randomphase inband sinusoids (ω ≤ 0.2π). We expect the signal to traverse the factored filter unaffected, and the output to be a delayed version of the input.

Test 2) Input signal is white Gaussian noise (uniform power across all frequencies). We expect to attenuate the portion of the signal that falls within the stopband (ω ≥ 0.28π) by 60 dB.

Test 3) Input signal is colored Gaussian noise with uniform power only in the stopband. We realize this using a sum of 100 randomphase sinusoids, uniformly distributed in the stopband (ω ≥ 0.28π). We expect our filter to attenuate the entire signal by at least 60 dB.

Test 4) Input signal is a sinusoid at ω_{p} = 0.2π passband edge.

Test 5) Input signal is a sinusoid at ω_{s} = 0.28π stopband edge.
A slightly more efficient realization is also possible, employing the inherent flexibility of the factored structure which can (as mentioned for Example 2) accommodate a nonuniform datapath wordlength (i.e., truncation/rounding levels) throughout the cascade. According to Fig. 25, while 15 bits are needed for truncation at the outputs of stages #1, #2, #3, #10, #11, #12 and #13 (to accommodate up to a 6dB increase in the stageoutput RMS values, compared to the RMS of the filter input), only 14 bits are needed for truncation at the outputs of stages #4, #5, #6, #7, #8, and #9.
Noise analysis for the factored IFIR structure in Fig. 22 :
5 Conclusion
In this paper, an apparently quite superior general method, and a corresponding structure, for achieving significantly more hardwareefficient implementations of FIR filters has been presented. This advancement employs our recently announced “optimal factoring of FIR filters.” We have demonstrated that by applying optimal factoring to welldesigned IFIR filters we can implement much better (more hardwareefficient) FIR digital filters. When assessing hardware cost as the sum of the required full adders and flipflops, we have demonstrated that such optimally factored IFIR filters can provide substantially lower hardware cost than that achieved by the methods presented in previous research publications. (Two of our examples show hardware reductions in the vicinity of 50%, in comparison to conventional Remez implementations. Indeed, the recent publication [15] shows these results to be quite close to a new “lower bound” for the hardware complexity of any FIR implementation that meets the specifications of these two FIR filters.) As shown in Table 3, our optimally factored IFIR filters can be particularly beneficial when specifications that push the technology speed limits are required, and in these cases the area and power savings for our optimally factored IFIR filters still appear quite substantial. Further properties, benefits, and alternative implementations of these filters have also been demonstrated when implementing wellknown examples. This further confirms the utility of the optimally factored IFIR filters in comparison with more conventional implementations.
An extension of this paper’s optimal factoring of IFIR filters to the optimal factoring of FRM (frequency response masking) filters is also evident. (Please see [6] for FRM details.) Basically, the FRM structure is an extension of the IFIR structure which includes additional FIRtype hardware (for the purpose of facilitating a broader class of FIR filters, including certain highpass and bandpass FIR filters whose direct implementation via an IFIR structure could seem problematic). In Figs. 3(a) and 5 of [6] it is shown that one can start with an IFIR structure and include two more FIR blocks to obtain an FRM filter implementation that may seem more suited for some desired filters, primarily bandpass FIR structures. While certain complications may arise when attempting to implement the FIR factoring efficiently in an FRM filter (i.e., one basic issue could concern a desire to preserve a pure delay chain z^{−Ln} with what could have a substantial length L, and which may thus seem inconsistent with FIR factoring), it can still be envisioned that the type of FIR factoring that we have presented here could be extended to FRM filters. This would, of course, be a possible topic for future research.
Footnotes
 1.
This paper’s optimally factored IFIR structure and the design methods for finding optimal factors, scaling and sequencing them are patent pending.
References
 1.J.W. Adams, A.N. Willson Jr., A new approach to FIR digital filters with fewer multipliers and reduced sensitivity. IEEE Trans. Circuits Syst. 30(5), 277–283 (1983)CrossRefGoogle Scholar
 2.M. Aktan, A. Yurdakul, G. Dündar, An algorithm for the design of lowpower hardwareefficient FIR filters. IEEE Trans. Circuits Syst. I Regul. Pap. 55(6), 1536–1545 (2008)MathSciNetCrossRefGoogle Scholar
 3.D.S.K. Chan, L.R. Rabiner, An algorithm for minimizing roundoff noise in cascade realizations of finite impulse response digital filters. Bell Syst. Tech. J. 52(3), 347–385 (1973)CrossRefzbMATHGoogle Scholar
 4.A.G. Dempster, M.D. Macleod, Use of minimumadder multiplier blocks in FIR digital filters. IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process. 42(9), 569–577 (1995)CrossRefzbMATHGoogle Scholar
 5.M. Faust, C. H. Chang, Optimization of structural adders in fixed coefficient transposed direct form FIR filters, in Proceedings of the IEEE International Symposium on Circuits and Systems (2009), pp. 2185–2188Google Scholar
 6.Y.C. Lim, Frequency–response masking approach for the synthesis of sharp linear phase digital filters. IEEE Trans. Circuits Syst. 33(4), 357–364 (1986)CrossRefGoogle Scholar
 7.Y.C. Lim, Y. Lian, The optimum design of one and twodimensional FIR filters using the frequency response masking technique. IEEE Trans. Circuits Syst. II 40(2), 88–95 (1993)CrossRefzbMATHGoogle Scholar
 8.Y.C. Lim, S.R. Parker, Discrete coefficient FIR digital filter design based upon an LMS criteria. IEEE Trans. Circuits Syst. 30(10), 723–739 (1983)CrossRefzbMATHGoogle Scholar
 9.J.H. McClellan, T.W. Parks, L.R. Rabiner, A computer program for designing optimum FIR linear phase digital filters. IEEE Trans. Audio Electroacoust. 21(6), 506–526 (1973)CrossRefGoogle Scholar
 10.A. Mehrnia, M. Dai, A.N. Willson Jr., Efficient halfband FIR filter structures for RF and IF data converters. IEEE Trans. Circuits Syst. II Express Briefs 63(1), 64–68 (2016)CrossRefGoogle Scholar
 11.A. Mehrnia, A. N. Willson, Jr., On optimal IFIR filter design, in Proceedings of the IEEE International Symposium on Circuits and Systems, vol 3 (2004), pp. 133–136Google Scholar
 12.A. Mehrnia, A.N. Willson Jr., Optimal factoring of FIR filters. IEEE Trans. Signal Process. 63(3), 647–661 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 13.A. Mehrnia, A.N. Willson Jr., Further desensitized FIR halfband filters. IEEE Trans. Circuits Syst. I Regul. Pap. 62(7), 1815–1824 (2015)CrossRefGoogle Scholar
 14.A. Mehrnia, A.N. Willson Jr., FIR filter design via extended optimal factoring. IEEE Trans. Signal Process. 64(4), 1061–1075 (2016)MathSciNetCrossRefGoogle Scholar
 15.A. Mehrnia, A.N. Willson Jr., A lower bound for the hardware complexity of FIR filters. IEEE Circuits Syst. Mag. 18(1), 10–28 (2018)CrossRefGoogle Scholar
 16.S. Nakamura, S.K. Mitra, Design of FIR digital filters using tapped cascaded FIR subfilters. Circuits Syst. Signal Process. 1(1), 43–56 (1982)CrossRefGoogle Scholar
 17.Y. Neuvo, C.Y. Dong, S.K. Mitra, Interpolated finite impulse response filters. IEEE Trans. Acoust. Speech Signal Process. 32(3), 563–570 (1984)CrossRefGoogle Scholar
 18.H. Samueli, An improved search algorithm for the design of multiplierless FIR filters with powersoftwo coefficients. IEEE Trans. Circuits Syst. 36(7), 1044–1047 (1989)CrossRefGoogle Scholar
 19.T. Saramaki, Y. Neuvo, S.K. Mitra, Design of computationally efficient interpolated FIR filters. IEEE Trans. Circuits Syst. 35(4), 70–88 (1988)CrossRefGoogle Scholar
 20.W. Schüssler, On structures for nonrecursive digital filters. Arch. Elek. Übertragung 26(6), 255–258 (1972)Google Scholar
 21.D. Shi, Y.J. Yu, Design of linear phase FIR filters with high probability of achieving minimum number of adders. IEEE Trans. Circuits Syst. I Regul. Pap. 58(1), 126–136 (2011)MathSciNetCrossRefGoogle Scholar
 22.D. Shi, Y.J. Yu, Design of discretevalued linear phase FIR filters in cascade form. IEEE Trans. Circuits Syst. I Regul. Pap. 58(7), 1627–1636 (2011)MathSciNetCrossRefGoogle Scholar
 23.P.P. Vaidyanathan, G. Beitman, On prefilters for digital FIR filter design. IEEE Trans. Circuits Syst. 32(5), 494–499 (1985)CrossRefGoogle Scholar
 24.A.N. Willson Jr., Desensitized halfband filters. IEEE Trans. Circuits Syst. I 57(1), 152–165 (2010)MathSciNetCrossRefGoogle Scholar
 25.C.Y. Yao, C.J. Chien, A partial MILP algorithm for the design of linear phase FIR filters with SPT coefficients. IEICE Trans. Fundam. E85A, 2302–2310 (2002)Google Scholar
 26.W.B. Ye, Y.J. Yu, Singlestage and cascade design of high order multiplierless linear phase FIR filters using genetic algorithm. IEEE Trans. Circuits Syst. I Regul. Pap. 60(11), 2987–2997 (2013)CrossRefGoogle Scholar
 27.Y.J. Yu, Y.C. Lim, Design of linear phase FIR filters in subexpression space using mixed integer linear programming. IEEE Trans. Circuits Syst. I Regul. Pap. 54(10), 2330–2338 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.