# An area-optimized *N*-bit multiplication technique using *N*/2-bit multiplication algorithm

- 254 Downloads

**Part of the following topical collections:**

## Abstract

A unique design for an optimized *N*-bit multiplier is proposed and implemented which utilizes a modified divide-and-conquer technique. The conventional technique requires four *N*/2-bit multipliers to perform *N*-bit multiplication, whereas the proposed design uses only one multiplier module in hardware to perform the functionality of four modules. It uses Dadda algorithm in its multiplier module. It has been implemented using Verilog HDL, and a good accuracy of results was observed in simulations which effectively verify its functionality. Design was also synthesized on various FPGAs including Spartan 3E, Virtex-5 and Virtex-7. Performance summary, after place and route, showed that the proposed approach significantly reduces hardware utilization. Furthermore, the proposed design is almost 75% more efficient in terms of resources utilization and operating frequency as compared to the conventional design.

## Keywords

Multiplier Area efficient Divide and conquer## 1 Introduction

Multiplication is the most common, critical and widely used operation in many applications. The commonly used architectures include Baugh-Wooley and Booth multiplier. For multiplication of signed numbers, typically the Baugh-Wooley multiplier is preferred [1]. Radix-2 and Radix-4 booth multipliers were implemented in [2] for 8-bit and 16-bit multiplication. It was claimed that Radix-4 booth multiplier utilizes less resources and achieves high speed. Vedic algorithm can be used to handle complex mathematical problems and logic design [3], and it is a fast and low-power algorithm [4]. It solves numerous mathematical problems in 16 distinct ways. Researchers have also utilized Urdhva-Tiryakbhyam and Nikhilam multiplication algorithms. The former is a high-speed algorithm as the partial products are generated and added concurrently [5] while the latter one is more efficient in terms of hardware utilization [6]. An array multiplier is the simplest architecture but its drawback is its higher number of partial products as compared to the tree multipliers and hence it consumes more resources and time [7]. Wallace tree is also an advanced, pipelined, fast and highly used algorithm [8]. Xilinx also provides some tools to optimize the area, delay and power of the designed system. These tools have been realized in [9] on Dadda, Booth, Array and Wallace multipliers. They exhibited different properties in balanced, area-optimized, timing performance and power-optimized modes.

In [10], a re-configurable digit-serial multiplier is proposed which used clock gating for power optimization but this work does not optimize the resource consumption of FPGA. A two-dimensional bypassing technique is used in [11] to design the multiplier, and the article focused on optimization of power consumption and delay, whereas the proposed approach presented in this article optimizes the resource consumption of FPGA. In [12], a multiplexer-based 8-bit multiplier is presented with 50 MHz frequency, whereas the proposed architecture achieves 320 MHz frequency for 16-bit multiplication. E. George Walters III presents array multipliers using six-input LUTs and shift register LUTs [13], whereas the research presented in this article presents those using four-input LUTs. The modern FPGAs have built-in multipliers in them but still the configurable multipliers using LUTs play a vital role in many applications due to their flexible size, placement and modification ability [13]. Many researchers have worked on the design of multipliers earlier, as reported in this section, but they have not explored the option of reusing the same resources using iterative methods. Some of the advanced digital signal processing (DSP) applications demand more resources. The significance of the proposed work lies in its ability to reuse its multiplier module for multiple iterations. Therefore, by reducing the resources for the multiplication process, this work allows the designer to dedicate more resources for other modules in complex applications.

It can be concluded from literature review that Dadda algorithm is the most efficient in speed [14], while array multiplier shows the longest delay. Moreover, a brief comparison of Dadda and Wallace tree multipliers, as presented in [15], concluded that the Dadda multiplier is better in terms of speed and complexity than the Wallace tree multiplier. Dadda algorithm for tree reduction is usually used for reducing the propagation delay in the addition process of partial products.

Dadda algorithm can be used for 16-bit or higher order multiplication but with the increase in number of bits, the complexity also increases. For a 4-bit Dadda multiplier, the maximum tree height of partial products is four and reduction stages are three [16]. When the same algorithm is utilized to perform 8-bit multiplication, the tree height increases to six and reduction stages increase to four [17], thereby increasing the resources consumption and delay. To address this issue, a hybrid technique is developed in this article to perform 16-bit multiplication that uses 8-bit Dadda multiplier and divide-and-conquer technique.

## 2 Multiplication technique

*N*-bit number into two

*N*/2-bit numbers. It executes series of multiplications and then performs the addition of partial products (PP) [18].

However, multiple adders are utilized to execute Eqs. (9–11) and the carry of each stage is added to the next stage. Details of the adder block are available in the later section. Similar concept can be used for higher bits.

## 3 Proposed optimized design

Conventionally, four multiplication processes of divide-and-conquer technique are implemented using four dedicated *N*/2-bit multipliers. It performs all multiplication processes at a time. Inputs at all four modules arrive at a time, and they produce partial products, respectively. A good optimization approach is to reuse the allocated resources on proper time intervals instead of dedicating each block of the design to only one iteration.

We propose a novel design to achieve *N*-bit multiplier using only one *N*/2-bit multiplier module. This concept is demonstrated for 16-bit multiplication using only one 8-bit multiplier module. It is a combination of divide-and-conquer mechanism and Dadda algorithm. The architecture and design of the proposed approach are given in Fig. 2.

### 3.1 Counter

A 2-bit up counter is used in this design to synchronize the multiplier inputs with respective storage registers. It resets its value to zero on reaching overflow state. The outputs of counter are connected to the MUX select lines as well as the decoder inputs.

### 3.2 Multiplexers

As in the proposed approach, there is only one multiplier module to produce four partial products. Therefore, the multiplier does not take the input values directly; it rather utilizes the multiplexers to generate partial products using four iterations. The inputs A and B are attached to the multiplexers as shown in Fig 2.

MUX outputs

Counter output | MUX 1 output | MUX 2 output | |
---|---|---|---|

Count [0] | Count [1] | ||

0 | 0 | \(A_{\rm L}\) | \(B_{\rm L}\) |

0 | 1 | \(A_{\rm L}\) | \(B_{\rm H}\) |

1 | 0 | \(A_{\rm H}\) | \(B_{\rm L}\) |

1 | 1 | \(A_{\rm H}\) | \(B_{\rm H}\) |

### 3.3 Decoder

Decoder and data registers

Decoder input | Decoder output | Active data registers | |
---|---|---|---|

Count [0] | Count [1] | ||

0 | 0 | Y0 | PP1 |

0 | 1 | Y1 | PP2 |

1 | 0 | Y2 | PP3 |

1 | 1 | Y3 | PP4 |

### 3.4 Multiplier module

The 8-bit Dadda multiplier performs four multiplication operations to produce \({\rm PP}_{1}\), \({\rm PP}_{2}\), \({\rm PP}_{3}\) and \({\rm PP}_{4}\) according to Eqs. 4–7. Dadda algorithm is usually used for reducing the propagation delay in the addition process of the partial products. The finite state machine of the proposed multiplication sequence is shown in Fig. 3 which uses the counter’s output sequence to generate partial products.

### 3.5 Adders

The addition of the partial products was accomplished by utilizing ripple carry adder since it is an area-efficient and less complex technique [19, 20]. Multiple adders have been used to carry out the addition of partial products to get the final product according to Eqs. (8–11). In single iteration, the adder can take only two inputs. Therefore, the addition process is further divided into multiple steps and the proposed design utilizes five 8-bit adders as shown in Fig. 2.

## 4 Results and comparison

The verification of the proposed algorithm was done by using multiple simulation environments. The design has been implemented using Verilog HDL and tested against various sets of inputs for multiplication. A good agreement between theoretical and simulation results was observed. The design was also implemented on various FPGAs including Spartan 3E (xc3s500-5fg320), Virtex-7 (xc7vx485t-3 ffg1157) and Virtex-5(xc5vlx20t-2ff323) to compare the resource utilization. The design is fully synthesizable, and estimation of all the resources was obtained after successful place and route process. Table 3 summarizes the resource utilization to verify the improved performance of the proposed design.

Performance summary of the proposed design

FPGA resources | Spartan 3E xc3s500-5 fg320 | Virtex-7 xc7vx485t-3 ffg1157 | Virtex-5 xc5vlx20t-2 ff323 |
---|---|---|---|

Used/available | Used/available | Used/available | |

Occupied slices | 133/4656 | 43/75900 | 60/3120 |

Slice register | 62/9312 | 52/607200 | 52/12480 |

Bounded IOBs | 67/232 | 67/600 | 67/172 |

Slice LUTs | 228/9312 | 131/303600 | 131/12480 |

Flip-flops | 62/9312 | 52/607200 | 52/12480 |

DSP48E | – | 2/2800 | 2/24 |

Min. period(ns) | 3.118 | 1.57 | 2.69 |

Freq. (MHz) | 320.7 | 634.51 | 370.78 |

Delay (ns) | 38.319 | 8.80 | 14.87 |

Comparative analysis of 16-bit multipliers with the proposed design

Proposed | [7] | [4] | |||
---|---|---|---|---|---|

Year | 2019 | 2016 | 2016 | ||

FPGA | Spartan 3E xc3s500-5fg320 | Spartan 3E xc3s500-5fg320 | Spartan 3E | ||

Algorithm | Dadda, Divide & Conquer | Array | Dadda | Wallace | Vedic |

Slice register | 132 | 493 | 493 | 493 | 493 |

Bounded IOBs | 67 | 66 | 66 | 66 | 66 |

4 input LUTs | 228 | 844 | 899 | 1000 | 1243 |

Flip-flops | 62 | 492 | 492 | 492 | - |

Freq. (MHz) | 320.7 | 79.10 | 70.03 | 80.205 | - |

Delay (ns) | 38.319 | 61.39 | 55.65 | 36.35 | 38.82 |

As the proposed design produces only one partial product at a time, it needs more iterations to perform complete multiplication process but it uses less resources. Consequently, there is a trade-off between resource consumption and number of iterations. Nevertheless, the proposed architecture has been designed with good optimization techniques which achieves high frequency leading to very quick process of iterations. Hence, this design not only reduces the resource utilization but is also fast as compared to the previous designs.

## 5 Conclusion

This article has presented a novel approach to design a multiplier by modifying the divide-and-conquer algorithm and optimizing it for resource utilization. For multiplication process, it uses Dadda algorithm. The design reduces the hardware multiplier modules from four to one, and therefore, it uses three times less resources as compared to the conventional approach. The proposed design can be operated at a higher frequency as compared to previous designs, which also makes it suitable for high-speed applications. It has been tested on various FPGAs to validate the results.

## Notes

### Compliance with ethical standards

### Conflict of interest

The authors declare that they have no conflict of interest.

## References

- 1.Badawi A, Alqarni A, Aljuffri A, BenSaleh MS, Obeid AM, Qasim SM (2015) FPGA realization and performance evaluation of fixed-width modified Baugh–Wooley multiplier. In: 2015 Third international conference on technological advances in electrical, electronics and computer engineering (TAEECE). IEEE, pp 55–158Google Scholar
- 2.Sukowati AI, Putra HD, Wibowo EP (2016) Usage area and speed performance analysis of booth multiplier on its FPGA implementation. In: International conference on informatics and computing (ICIC). IEEE, pp 117–121Google Scholar
- 3.Narula U, Tripathi R, Wakhle G (2015) High speed 16-bit digital vedic multiplier using FPGA. In: 2015 2nd international conference on computing for sustainable global development (INDIACom). IEEE, pp 121–124Google Scholar
- 4.Ram GC, Lakshmanna YR, Rani DS, Sindhuri KB (2016) Area efficient modified vedic multiplier. In: 2016 international conference on circuit, power and computing technologies (ICCPCT). IEEE, pp 1–5Google Scholar
- 5.Kumar KS, Swathi M (2018) 128-bit multiplier with low-area high-speed adder based on vedic mathematics. In: Proceedings of 2nd international conference on micro-electronics, electromagnetics and telecommunications. Springer, pp 163–172Google Scholar
- 6.Patil HP, Sawant S (2015) Design and implementation of energy efficient vedic multiplier using FPGA. In: 2015 International conference on information processing (ICIP). IEEE, pp 206–210Google Scholar
- 7.Ram GC, Rani DS, Balasaikesava R, Sindhuri KB (2016) Design of delay efficient modified 16 bit Wallace multiplier. In: IEEE International conference on recent trends in electronics, information & communication technology (RTEICT). IEEE, pp 1887–1891Google Scholar
- 8.Martha P, Kajal N, Kumari P, Rahul R (2018) An efficient way of implementing high speed 4-bit advanced multipliers in FPGA. In: 2018 2nd international conference on electronics, materials engineering & nano-technology (IEMENTech). IEEE, pp 1–5Google Scholar
- 9.Swee KLS, Hiung LH (2012) Performance comparison review of 32-bit multiplier designs. In: 2012 4th International conference on Intelligent and advanced systems (ICIAS), vol 2. IEEE, pp 836–841Google Scholar
- 10.Elsayed E, El-Boghdadi HM (2015) A novel power-efficient multi-operand digit-multiplier using reconfiguration and clock gating. J Supercomput 71:2539–2564CrossRefGoogle Scholar
- 11.Srinivas KB, Aneesh YM (2014) Low power and high speed row and column bypass multiplier. In: 2014 IEEE international conference on computational intelligence and computing research, pp 1–4Google Scholar
- 12.Rashidi B, Sayedi SM, Farashahi RR (2014) Design of a low-power and low-cost booth-shift/add multiplexer-based multiplier. In: 2014 22nd Iranian conference on electrical engineering (ICEE), pp 14–19Google Scholar
- 13.Walters EG (2016) Array multipliers for high throughput in xilinx FPGAs with 6-input LUTs. Computers 5(4)CrossRefGoogle Scholar
- 14.Buddhe V, Palsodkar P, Palsodakar P (2014) Design and verification of dadda algorithm based binary floating point multiplier. In: 2014 International conference on communications and signal processing (ICCSP). IEEE, pp 1073–1077Google Scholar
- 15.Townsend WJ, Swartzlander Jr. EE, Abraham JA A comparison of dadda and wallace multiplier delays. In: Optical science and technology, SPIE’s 48th Annual Meeting. International Society for Optics and Photonics, pp 552–560Google Scholar
- 16.Riaz MH, Ahmed SA, Javaid Q, Kamal T (2018) Low power 4 x 4 bit multiplier design using dadda algorithm and optimized full adder. In: 2018 15th international Bhurban conference on applied sciences and technology (IBCAST), pp 392–396Google Scholar
- 17.Jeevan B, Narender S, Reddy CK, Sivani K (2013) A high speed binary floating point multiplier using dadda algorithm. In: 2013 International multi-conference on automation, computing, communication, control and compressed sensing (iMac4s). IEEE, pp 455–460Google Scholar
- 18.Manolopoulos K, Reisis D, Chouliaras VA (2011) An efficient multiple precision floating-point multiplier. In: 2011 18th IEEE international conference on electronics, circuits and systems (ICECS). IEEE, pp 153–156Google Scholar
- 19.Shinde KD, Badiger S (2015) Analysis and comparative study of 8-bit adder for embedded application. In: 2015 International conference on control, instrumentation, communication and computational technologies (ICCICCT). IEEE, pp 279–283Google Scholar
- 20.Saini J, Agarwal S, Kansal A (2015) Performance, analysis and comparison of digital adders. In: 2015 International conference on advances in computer engineering and applications (ICACEA). IEEE, pp 80–83Google Scholar