Introduction

Multiplication is a fundamental arithmetic operation and plays a significant role in digital signal processing (DSP) applications. In the multiplier, partial products obtained through the multiplication process are added using adders. However, as the number of partial products increases, more adders are required, resulting in increased power consumption. Thus, with technological advancements, the need for low-power multipliers has arisen [10]. To achieve this, low-power techniques are employed to reduce the number of partial products effectively while also decreasing power consumption. Most low-power techniques aim to reduce dynamic power usage by dividing the arithmetic unit into two sections, most significant part (MSP) and least significant part (LSP), and turning off the noneffective component during partially guarded computation (PGC) to save power. Experimental findings have shown that the PGC approach reduces power consumption by 10–44% in array multipliers. Another low-power design approach involves checking and carrying out the functional unit addition operation, scaling the resultant total to equal the initial length. The simulation’s findings demonstrate that low-power adders perform computations more effectively than traditional adders. A low-power multiplier design that reduces the switching activity of partial products by using the booth algorithm (Radix 4) results in low-power consumption with increased delay and area. Glitch power can be reduced by changing current gates to ones that have a control input, shortening the settling time to reduce minimum power downtime after reactivation. The proposed system employs two techniques: canonical signed digit and spurious power suppression technique (SPST). Canonical signed digit reduces the number of nonzero digits in a number, thereby reducing the number of partial products. SPST divides the two N-bit binary number into MSP and LSP, computing MSP results only when the computation’s results are affected, lowering the dynamic power, and reducing the overall power consumption of the VLSI combinational circuit [1].

Related work

A novel technique selectively activates components within functional units based on the input data, effectively reducing unnecessary power consumption. The findings and experimental results highlight the potential of partially guarded computation in achieving significant power savings while maintaining system performance. This work serves as an important reference for researchers and practitioners in the field of low-power electronics design, guiding the development of energy-efficient solutions for modern electronic systems. In their work, Choi, Jeon, and Choi focused on power minimization of functional units through partially guarded computation. They presented their findings at the IEEE International Symposium on Low Power Electronics Design in 2000. The authors introduced a technique that aimed to reduce power consumption by selectively activating functional units only when necessary. By employing this approach, they achieved significant power savings in the design of functional units. The paper provides detailed insights into the methodology and experimental results obtained, showcasing the effectiveness of their proposed technique [2]. Chen, Sheen, and Wang presented a low-power adder design that operates on effective dynamic data ranges. Their work, published in the IEEE Transactions on Very Large Scale Integration (VLSI) Systems in 2002, aimed to reduce power consumption in adder circuits. The authors introduced an innovative approach that utilizes dynamic data ranges to optimize the adder’s power efficiency. By adjusting the data range dynamically based on the input operands, the proposed adder achieved significant power savings while maintaining accuracy. The paper provides a comprehensive analysis of the design methodology, experimental results, and comparative evaluations with other adder architectures [3]. In the study titled “Minimization of switching activities of partial products for designing low-power multipliers,” the authors developed a low-power design technique that focuses on minimizing the switching activities of partial products. They examined the switching activities of several partial product generation systems and identified the factors that influenced power usage. The proposed method is a modified partial product generation technique that lowers switching activities [4]. The authors of the study “Glitch power minimization by selective gate freezing” developed a method to reduce glitch power consumption in digital circuits by selectively freezing specific gates. Glitches are short-lived, undesired voltage changes that occur in combinational circuits when logic gates are switched. These glitches can waste a large amount of power, increasing the circuit and overall power usage. The proposed algorithms identify the gates that should be frozen based on their glitch-generating behavior [5]. In their work, Benini et al. proposed a technique called selective gate freezing for glitch power minimization. Published in the IEEE Transactions on Very Large Scale Integration (VLSI) Systems in 2000, the authors addressed the issue of power consumption in digital circuits due to glitches. They introduced a novel approach that selectively freezes certain gates in the circuit during inactive periods, reducing dynamic power dissipation. The paper presents a detailed analysis of the methodology, experimental results, and comparisons with existing glitch power reduction techniques. The findings highlight the effectiveness of selective gate freezing in achieving significant power savings while maintaining circuit functionality [6]. Henzler et al. presented a fast power-efficient circuit-block switch-off scheme in their publication in Electronics Letters in 2004. The authors addressed the power consumption issue in integrated circuits by proposing a technique that selectively switches off circuit blocks during periods of inactivity. By employing a fast switch-off mechanism, the proposed scheme achieved power savings by reducing leakage and dynamic power dissipation. The paper provides a concise explanation of the technique along with experimental results demonstrating its effectiveness in reducing power consumption. Comparative evaluations with other power reduction schemes further emphasize the advantages of the proposed circuit-block switch-off scheme [7]. Lakshmi et al. proposed a design technique for low-power multipliers by employing the spurious power suppression technique (SPST). The authors addressed the challenge of power consumption in multiplier circuits and introduced an innovative approach to mitigate spurious power dissipation. The SPST method effectively suppressed power consumption by identifying and reducing unnecessary power dissipation caused by various sources, such as glitching and leakage currents. The paper presents a comprehensive explanation of the design methodology, experimental results, and comparative evaluations with conventional multiplier architectures. The findings highlight the effectiveness of the SPST technique in achieving substantial power savings while maintaining the desired functionality of the multiplier circuit [1].

Proposed method

Figure 1 shows the block diagram of an SPST-enabled CSD multiplier. The inputs, A and B, are both 16-bit numbers. The partial product candidate generator block takes in input A and generates three partial product candidates, with values of − A, 0, and A, each having 32 bits. The CSD recoder output provides magnitude and sign data, which are used to select one of these partial products. The 16-bit input B is recoded by the CSD recoding block to produce 17-bit magnitude and sign values. Depending on these values, 17 partial products are created, and one is chosen for further processing. The selection of the partial product is based on the nonzero magnitude values from the CSD block. Seventeen partial products are then generated, of which only nine are chosen. The chosen partial products, except for the ninth one, are provided to the SPST adder, with two of them being supplied back to back. The shifting block left shifts the remaining 16 partial products. The conventional adder receives the outputs from the four SPST adders and provides the final result.

Fig. 1
figure 1

Canonical signed digit multiplier block diagram with spurious power suppression enabled

CSD recoding block

The canonical signed digit is one type of number representation used for arith metic operations. It is also known as a recoding technique since it will recode the original number to create a new one with a minimum amount of nonzero digits. This method guarantees that the average number of nonzero digits will never be greater than n/2. The canonical signed digit (CSD) representation is distinct because consecutive nonzero digits are one of the key properties of the CSD [8].

The CSD recoding circuit is constructed to take benefit of this property by converting three input bits into a single CSD digit, as shown in Fig. 2. The converter recodes three binary digits, i.e., bi+1, bi, and bi1, into a single CSD digit xi which is represented in terms of magnitude bit xi,m and sign bit xi,s. In the sign-magnitude encoding, 0.1 and − 1 are represented as 00, 01, and 11, respectively. Additionally, two bypass signals are employed, namely input bypass signal pi and output bypass signal pi+1. The truth table for binary to CSD conversion is shown in Table 1. It can be seen that when pi = 0, a single CSD digit xi and the new bypass signal for the following procedure are formed from the three binary digits bi+1, bi, and bi1. The magnitude bit xi,m has the same value as the output bypass signal pi+1. The magnitude bit is determined by the value of bi and bi1, while the sign bit is impacted by bi+1 and xi,m. Regardless of the inputs, all outputs become zeros when pi = 1. Therefore, in this instance, the converter’s inputs are ignored or bypassed, generating the next process’s bypass signal pi+1 with the value zero [9].

Fig. 2
figure 2

Binary to CSD conversion block

Table 1 Truth table showing binary to CSD conversion

The 16-bit CSD recoding block is shown in Fig. 3. The 17-bit CSD representation of the input is generated by this circuit in the form of 17-bit magnitude values and 17-bit sign values. The single CSD digit binary to CSD conversion circuit used by the recoding block produces the sign bit and magnitude bit.

Fig. 3
figure 3

16-bit CSD recoding circuit

SPST adder

Figure 4 shows the cause for the spurious signal transitions. It is clear from the first and second cases that adding the two operands has no effect on the MSP outcome, whether or not there is a carry from LSP. Results for MSP can be expected from the sum achieved in both circumstances.

Fig. 4
figure 4

Examples showing the spurious transitions

Eliminating the computation of MSP of operands can lead to a reduction in switching activities in related components, which subsequently lowers power consumption in the adder stage and minimizes glitching noises. In this study, an SPST adder has been developed that divides the adder into two sections and freezes the MSP input data if it does not affect the final sum. A detection logic circuit has been devised to identify the effective range of input and determine whether MSP results are influencing the calculation outcome. The Boolean expression used to design this logic circuit is displayed below:

$$A_{\mathit M\mathit S\mathit P}=A\left[31:16\right]B_{MSP}=B\left[31:16\right]$$
(1)
$$A_{and}=A\left[31\right]A\left[30\right]\dots A\left[16\right]$$
(2)
$$B_{and}=B\left[31\right]B\left[30\right]\dots B\left[16\right]$$
(3)
$$A_{NOR}=A\overline{\left[31\right]+A\left[30\right]++A\left[16\right]}$$
(4)
$$B_{NOR}=\overline{B\left[31\right]+B\left[30\right]++B\left[16\right]}$$
(5)
$$Close=\overline{\left(A_{and}+A_{nor}\right)\left(B_{and}+B_{nor}\right)}$$
(6)

Where A[m] is the mth bit of the operand A and B[n] is the nth bit of the operand B. AMSP and BMSP are the MSP part of the inputs A and B. When all of the bits in AMSP and BMSP are zeros,

Aand and Band have zero values. The value of Anor and Bnor is zero when all of the bits in AMSP and BMSP are ones. The detection logic unit will produce three output signals: Close, Carrctrl, and Sign. The MSP section will either be disabled or not, depending on the Close value. The MSP component is disabled to reduce power consumption if the Close is zero. This reduces the switching operations in the MSP section, resulting in zero dynamic power usage. The zero inputs are then sent to the MSP part. The MSP result obtained will be computed in the detection logic unit, and MSP bits are compensated by the Sign and Carrctrl signals.

The Boolean expression for the Sign and Carrctrl signals is obtained from the Karnaugh map as shown in Figs. 5 and 6. Using the eight possible combinations of the inputs A and B, the Sign, Carrctrl, Close, Aand, Band, Anor, and Bnor are generated which is shown in Table 2.

Fig. 5
figure 5

Karnaugh map of Carrctrl expression

Fig. 6
figure 6

Karnaugh map of Sign expression

Table 2 Sign, Carrctrl, and Close computation for eight combinations of inputs A and B

The expression of Carrctrl and Sign are derived from the Karnaugh map is given in Eqs. 7 and 8.

$$\begin{array}{c}\begin{array}{c}Carrctrl=\left(\overline{C_{LSP}}.\overline{A_{AND}}.A_{NOR}.B_{AND}.\overline{B_{NOR}}\right)+\\\left(\overline{C_{LSP}}.A_{AND}.\overline{A_{NOR}}.\overline{B_{AND}}.B_{NOR}\right)+\end{array}\\\left(C_{LSP}.\overline{A_{AND}}.A_{NOR}.\overline{B_{AND}}.B_{NOR}\right)+\\\left(C_{LSP}.A_{AND}.\overline{A_{NOR}}.B_{AND}.\overline{B_{NOR}}\right)\end{array}$$
(7)
$$\begin{array}{c}sign ={C}_{LSP} .\left({A}_{AND}.{A}_{NOR}.{B}_{AND}.{B}_{NOR}+\right.\\ {A}_{AND}. \overline{{A }_{NOR}}.\overline{{B }_{AND}}.{B}_{NOR}+\\ \begin{array}{c}\left.{A}_{AND}. \overline{{A }_{NOR}}{B}_{AND}.\overline{{B }_{NOR}}\right)+\\ {C}_{LSP} .{A}_{AND}. \overline{{A }_{NOR}}. {B}_{AND}\overline{{B }_{NOR}}\end{array}\end{array}$$
(8)

The detection logic circuit is shown in Fig. 7, and Fig. 8 shows the 32-bit SPST adder design. The two 32-bit inputs A and B in this design are split into the MSP and LSP.

Fig. 7
figure 7

Detection logic circuit design

Fig. 8
figure 8

SPST adder

The LSP adder computes the LSP independently. Latches are employed in the MSP component to control the inputs to the MSP adder designed with AND gates. If an MSP computation is required, the latches allow two MSP inputs to enter the adder; otherwise, they freeze the MSP inputs and permit the MSP adder to receive zero inputs. Moreover, the detection logic circuit receives these MSP inputs and employs them to determine whether to activate or deactivate the MSP.

The detected logic circuit enables the latches to provide MSP inputs to the MSP adder only if MSP computation is needed. On the other hand, if MSP computation is not needed, the detection logic circuit will disable the latch and provide zero inputs to the MSP adder. The resulting MSP sum will then be compensated by the sign extension circuit. The sign extension circuit receives three signals from the detecting logic circuit as inputs.

Partial product candidate generation based on CSD magnitude and sign values

The block diagram of this architecture features a partial product generator that takes in a 16-bit input A and generates three possible partial products: A, 0, and − A. The sign and magnitude values produced by the CSD recoder block are utilized to select which of these 16-bit partial products to use. Additionally, a second 16-bit input (B) is fed into this block, which generates a 17-bit recoded output represented in terms of 17-bit magnitude and sign values. The output from the partial product generator block is 17, 16-bit length partial products, as depicted in Fig. 9.

Fig. 9
figure 9

Partial product candidate generation

Selection of partial products block

In the previous design, 17 partial products of 16 bits each were generated and passed to the sign extension and shifting block. In this block, each partial product is extended to 32 bits using the sign extension method, which involves making the theoretical calculation of the multiplication process 32 bits long regardless of the actual length of the partial product obtained. During the multiplication process, all partial products except the first one are shifted by 1 bit. After sign extension, shifting operation is performed. The resulting partial products are then passed to the selection of partial products block, where only nine partial products out of the 17 are selected. This is because the CSD recoded output will have only n/2 nonzero values. Therefore, only those partial products whose sign values match the recoder output’s 01 and 11 are selected, as shown in Fig. 10.

Fig. 10
figure 10

Selection of partial products

Adding of partial products

After extracting the nine selected partial products, all except for the ninth one are fed into the SPST adder in pairs of two. The four resulting totals from the SPST adders are then passed into the two conventional adders, as illustrated in Fig. 11. To obtain the final result, the sum from the conventional adders is added to the ninth partial product using a traditional adder.

Fig. 11
figure 11

Adding of partial products block

Power and area-efficient 256 FFT architecture

The fast Fourier transform (FFT) technique is a popular DSP method used to convert signals from the time domain to the frequency domain and vice versa. However, FFT involves a large number of multiplication operations, which can consume significant power. To address this issue, a power and area-efficient architecture for a 256-point FFT can be developed by employing the SPST-enabled CSD multiplier technique, as illustrated in Fig. 12 [10]. In this architecture, the complex multiplier block utilizes an SPST-enabled CSD multiplier to perform the multiplication of the twiddle factors.

Fig. 12
figure 12

FFT architecture — 256 points

Results and discussion

The Verilog code for the SPST-based CSD multiplier was implemented using Cadence with 90-nm technology. Figures 13 and 14 depict the RTL design and the output waveform of the SPST adder, respectively. It can be observed from the waveform that the MSP computation is not performed during the negative cycle, even though the LSP output is present. Instead, MSP operations are carried out on the positive edge of the clock. As a result, the final sum is determined during the positive edge of the clock.

Fig. 13
figure 13

SPST adder-RTL schematic

Fig. 14
figure 14

Output waveform of SPST adder

Table 3 presents the performance metrics of the SPST adder. On the other hand, Table 4 illustrates the dynamic power consumption of the MSP portion of the SPST adder and ripple carry adder. Notably, Table 4 reveals a remarkable reduction in dynamic power consumption for the MSP adder. This decrease is attributed to the exclusion of two ineffective input computations that the MSP adder does not add. Instead, the detecting logic unit of the SPST adder compensates for any obtained sum outcomes. As a result, unnecessary switching activity in the MSP is reduced, resulting in lower dynamic power consumption.

Table 3 Performance parameters of the SPST adder
Table 4 Comparison of SPST adder and ripple carry adder in terms of dynamic power

Figure 15 shows the output waveform for the signed multiplication for all signed input combinations. The performance parameters of the SPST-based CSD multiplier are shown in Table 5. A total of 256-point FFT architecture is implemented using SPST-enabled CSD multiplier. The FFT architecture is also implemented using Baugh Wooley-multiplier. The results obtained were compared, and it has been observed that SPST-enabled CSD multiplier consumes less power and area compared to Baugh-Wooley multiplier. The results are shown in Table 6.

Fig. 15
figure 15

Output waveform for signed multiplication

Table 5 Performance parameters of the SPST-based CSD multiplier
Table 6 Comparison of power and area of modified system for signed multiplication

Conclusions

The canonical signed digit is a number representation commonly used in arithmetic operations, where the original number is recoded to produce a new one with the least possible number of nonzero digits. This method has the advantage of ensuring that the average number of nonzero digits is always less than or equal to n/2. The SPST adder incorporates both LSP and MSP adders in its design, with the MSP component turned off depending on the input’s dynamic range. This feature contributes to a reduction in dynamic power dissipation.

After deployment, the SPST adder demonstrated a significant reduction in dynamic power consumption. For an input combination consisting of 50% data and a dynamic range of 16 bits out of 32 bits, the power dissipation decreased by 38.5% compared to the carry ripple adder (with MSP adder enabled for 50% of input combinations). The proposed multiplier consumed 0.561 mW of power. Furthermore, the proposed system was applied to the power and area-efficient 256-point FFT architecture, leading to an 86.6% reduction in overall power consumption compared to the same application using the Baugh-Wooley multiplier.