1 Introduction

Multiplication with constants is a regular operation in digital signal processing (DSP) systems. In hardware, a multiplication is demanding in terms of area and power consumption. However, the single constant multiplication (SCM) and multiple constant multiplication (MCM) operations can be implemented by using only shifts, additions, and subtractions, with the last two being usually referred in general form as additions [1]. The SCM case is when an input is multiplied by a constant coefficient (Fig. 1a), and the MCM operation is when an input is multiplied by a set of constant coefficients (Fig. 1b) [2]. Theoretical lower bounds for the number of adders and for the number of depth levels, i.e., the maximum number of serially connected adders (also known as the critical path), in SCM, MCM, and other constant multiplication blocks that are constructed with two-input adders under the shift-and-add scheme have been presented in [3]. Tighter lower bounds, as well as a new bound, namely, the one for the number of extra adders required to preserve the lowest number of depth levels, were presented in [4] for the SCM case. Nevertheless, there are no theoretical lower bounds for the case of constant multiplication blocks that include multiple input additions/subtractions and pipeline registers in the involved arithmetic operations. However, this type of operations has become very important mainly when the pipelined constant multiplication blocks are implemented in the increasingly demanded field programmable gate array (FPGA) platforms. This is due to the fact that logic blocks of FPGAs include memory elements, and thus, pipelining results in low extra cost [5,6,7,8,9,10,11,12]. Currently, the use of three-input adders has started to gain importance, since the logic blocks of the newest families of FPGAs are bigger and allow to fit more complex adders using nearly the same amount of hardware resources [10,11,12].

Fig. 1
figure 1

Block diagram a SCM and b MCM

Particularly, in the last two decades, many efficient high-level synthesis algorithms have been introduced for the multiplierless design of constant multiplication blocks. The common cost function to be minimized in these algorithms is given by the number of arithmetic operations (additions and subtractions) needed to implement the multiplications. Nevertheless, the critical path has the main negative impact in the speed and power consumption [13,14,15,16,17,18]. Therefore, substantial research activity has been carried out currently targeting both, application-specific integrated circuits (ASICs) [19,20,21] and FPGAs [5,6,7,8,9,10, 22,23,24,25], where the minimization of the number of arithmetic operations subject to a minimum number of depth levels is the ultimate goal.

On the other hand, even though ASICs still provides higher performance and low power consumption, the increased development time and manufacturing cost which comes with smaller CMOS transistor technologies have opened a large market for FPGAs. The FPGA technology provides the signal processing engineers with the ability to construct a custom data path that is tailored to the application at hand [26, 27]. FPGAs offer the flexibility of instruction set digital signal processors, while providing the processing power and flexibility of an ASIC, and enable significant design cycle compression and time-to-market advantages, an important consideration in an economic climate with ever-decreasing market windows and short product life cycles [28, 29].

The novelty of this paper is to introduce the theoretical lower bounds for the number of operations necessary to implement pipelined single constant multiplication (PSCM) and pipelined multiple constant multiplication (PMCM) blocks that are constructed with the shift-and-add scheme. For the derivation of these bounds, we consider that either an n-input (where n is an integer) pipelined addition/subtraction or a single pipeline register have the same cost. As mentioned earlier, recently, this assumption fits particularly well for cases where n is set equal to 3 and the target platforms for implementation are the newest FPGAs from the two most dominant manufacturers, Xilinx and Altera. However, it is worth highlighting that n = 2 is still under common use in many applications. This contribution is important because the optimality of different algorithms that reduce the number of operations in PSCM and PMCM blocks can be tested using appropriate theoretical lower bounds. Additionally, these bounds can be useful to develop new algorithms.

The paper is organized as follows. In the next section, definitions and methods needed to address the proposal are given. Section 3 presents the new theoretical lower bounds along with theorems and proofs to support the derivation of these bounds. Comparisons with previous theoretical lower bounds from [3] and [4] are provided in Section 4. Finally, conclusions are given in Section 5.

2 Definitions of terms

The constant multiplications referred here are expressed in fixed-point arithmetic because implementations in this number representation have higher speed and lower cost, thus being usually employed in DSP algorithms [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25, 30,31,32,33,34,35,36,37,38,39,40]. Only integer, positive, odd constants are considered since this is a useful simplification that does not affect the formulation of constant multiplication problems. In this sense, a constant can be expressed simply in binary form, as follows:

$$ c={\displaystyle \sum_{i=0}^{B-1}{b}_i{2}^i}, $$

where b i ∈{0, 1} is the i-th bit and B is the word length [31]. We can express a product of a variable input X by a constant c with the shift-and-add approach using the binary representation of that constant to dictate the multiplier structure. For example, the product 47X, with 47 = 25 + 23 + 22 + 21 + 20 (i.e., a binary string “101111”), needs four additions and has a critical path of three additions, as show in Fig. 2. The implementation cost of a shift-and-add constant multiplier is the number of arithmetic operations since products by powers of two are implemented as hardwired shifts with no practical cost.

Fig. 2
figure 2

Implementation structure of the product 47X with constant 47 expressed in binary using n = 2 input adders

It is worth to highlight that additions and subtractions require practically equal amount of resources in hardware implementation. Hence, signed digit (SD) representations of a constant can reduce the aforementioned implementation cost because they employ negative digits, which represent subtractions. An SD representation of a constant is given in the form,

$$ c={\displaystyle \sum_{i=0}^{B-1}{d}_i{2}^i}, $$

where d i ∈{−1, 0, 1}, with “−1” usually expressed as \( \overline{1} \)[32]. Among them, the canonical signed digit (CSD) representation is convenient since its number of non-zero digits is the minimum number of signed digits (MNSD) [3]. Besides, each non-zero digit is followed by at least one zero, which makes the representation unique. The CSD form of a constant can be found from binary by iteratively substituting every string of k digits “1” (say, “1111”) with a string of k − 1 digits “0” between a “1” and a “−1” (the string 1111 becomes “\( 1000\overline{1} \)”). In this case, the product 47X, with 47 = 26 − 24 −20 (i.e., a CSD string “\( 10\overline{1}000\overline{1} \)”), needs two subtractions and has two operations in its critical path, as shown in Fig. 3.

Fig. 3
figure 3

Implementation structure of the product 47X with constant 47 expressed in CSD using two input adders

In a constant multiplication block, the A-operation [30] represents two-input addition or subtraction along with shifts, and it is defined as

$$ {A}_q\left({u}_1,{u}_2\right)=\left|{2}^{l_1}{u}_1+{\left(-1\right)}^{s_2}{2}^{l_2}{u}_2\right|{2}^{- r}, $$

where l 1 ≥ 0 and l 2 ≥ 0 are left shifts, r ≥ 0 is a right shift, s 2 is a binary value, i.e., s 2∈{0,1}, q is the set of parameters (so-called the configuration) of the A-operation, i.e., q = {l 1, l 2, r, s 2}, and u 1 and u 2 are odd integers. For three-input adders the A-operation is [10]

$$ {A}_q\left({u}_1,{u}_2,{u}_3\right)=\left|{2}^{l_1}{u}_1+{\left(-1\right)}^{s_2}{2}^{l_2}{u}_2+{\left(-1\right)}^{s_3}{2}^{l_3}{u}_3\right|{2}^{- r}, $$

where l 1 ≥ 0, l 2 ≥ 0, and l 3 ≥ 0 are left shifts, r ≥ 0 is a right shift, s 2 and s 3 are binary values, q = {l 1, l 2, l 3, s 2, s 3, r} is the configuration of the A-operation, and u 1, u 2, and u 3 are odd integers. Generalizing to n-inputs, the A-operation is expressed as

$$ {A}_q\left({u}_1,\dots, {u}_n\right)=\left|{2}^{l_1}{u}_1+{\displaystyle \sum_{i=2}^n{\left(-1\right)}^{s_i}{2}^{l_i}{u}_i}\right|{2}^{- r}, $$

where l 1 ≥ 0,…, l n  ≥ 0 are left shifts, r ≥ 0 is a right shift, s 2,…, s n are binary values, q = {l 1, , l n , s 2, …, s n , r} is the configuration of the A-operation, and u 1,…, u n are odd integers.

An array of interconnected A-operations forms a SCM or a MCM block. The MCM is built upon SCM because the latter is the simplest case. The SCM array is represented using directed acyclic graphs (DAGs) with the following characteristics [33,34,35,36]:

  • The output of each A-operation is called fundamental.

  • For a graph with m A-operations, there are m + 1 vertices and m fundamentals.

  • Each vertex has an in-degree n, except for the input vertex which has in-degree zero.

  • A vertex with in-degree n corresponds to an n-input A-operation.

  • Each vertex has out-degree larger than or equal to one except for the output vertex which has out-degree zero.

  • The constant resulting from the last A-operation is output fundamental (OF). The constants resulting from previous A-operations are non-output fundamentals (NOFs).

In the MCM case, there are several OFs.

The DAG representation is the most useful for saving arithmetic operations because it allows to exploit structures to interconnect A-operations that cannot be seen in the CSD representation. This expands the opportunity to optimize the constant multiplication blocks. For example, the product 45X, with 45 = 26 − 24 − 22 + 20 (i.e., a CSD string “\( 10\overline{1}0\overline{1}01 \)”), needs three 2-input additions and has a critical path of two additions, as show in Fig. 4a. However, by using the DAG approach, the multiplication 45X requires two 2-input additions and has a critical path of two additions. In this case, it is possible to factorize the constant in two factors, namely, 5 and 9, as shown in Fig. 4b.

Fig. 4
figure 4

Structure of the product 45X a constant 45 expressed in CSD and b constant 45 in graph representation using two input adders

It is important to mention that a multiplicative graph is the graph obtained by cascading subgraphs, and the union point between two cascaded subgraphs in a multiplicative graph is called articulation point [37]. This is illustrated in Fig. 5a. A particular case is the completely multiplicative graph, where each cascaded subgraph is composed by one A-operation, as shown in Fig. 5b [4]. The graph presented in Fig. 4b is an example of a completely multiplicative graph with 2-input A-operations. Other graphs without articulation points are referred as non-multiplicative graphs [37]. A cascaded interconnection of a completely multiplicative graph with a non-multiplicative graph is called generalized graph, see Fig. 5c.

Fig. 5
figure 5

a multiplicative graph, b completely multiplicative graph, and c generalized graph

The speed of a design is restricted by the critical path. The pipelining technique allows the reduction of a critical path introducing registers along the data path [38]. In FPGA implementations, the constant multiplications involving shifts-and-add operations can be made fully pipelined with a low extra cost. Pipelining has a small overhead due to the fact that the logic blocks in FPGAs include memory elements, which are otherwise unused [39, 40]. For example, Table 1 shows the amount of logic elements used to implement the multiplier 45X (for an 8-bit input) in an Altera Cyclone IV EP4CE115F29C7 FPGA. We observe that only three extra logic elements are needed in the pipelined implementation, which represents an increase of 9.7% in resources utilization compared with the non-pipelined case. Nevertheless, the frequency of operation is increased by 31.7%.

Table 1 Synthesis results of pipelined and non-pipelined implementations of a 45X multiplier in the Altera Cyclone IV EP4CE115F29C7 FPGA

Due to the aforementioned observation, the implementation cost will be accounted by the number of registered operations (R-operations), i.e., either an addition-register pair or a single register, needed to implement constant multiplications. Two R-operations with the same cost are illustrated in a simplified way in Fig. 6. Hence, the PSCM problem consists in finding the pipelined array of A-operations that form a single-constant multiplier using the minimum number of R-operations. Similarly, the PMCM problem consists in finding the pipelined array of A-operations that form a multiple-constant multiplier using the minimum number of R-operations.

Fig. 6
figure 6

R-operations with the same cost

To calculate the lower bounds for the number of R-operations required to implement PSCM and PMCM blocks, we need the following information from a constant:

  1. 1)

    Its MNSD, denoted by S. We will also refer to this number in a more informal manner as “the number of non-zero digits”.

  2. 2)

    Its number of prime factors (it does not matter if these prime factors are repeated). This number is denoted by Ω.

3 Proposed lower bounds

In the following, we state, in Subsection 3.1, Theorems 1 to 8 to derive the lower bounds of R-operations in PSCM and, in Subsection 3.2, Theorems 9 and 10 for PMCM, along with their corresponding proofs. The pipelining operation, which has not been alluded in the previous works [3] and [4], is explicitly included in the proposed lower bounds through the R-operations.

3.1 PSCM case

Whenever a constant c is mentioned in the theorems of this sub-section (Theorems 1 to 8), we consider that the MNSD of that constant is S and its number of prime factors is Ω.

Theorem 1 provides the upper limit of non-zero digits that can be generated by any graph with a given number of depth levels, regardless of its number of R-operations. From this, we can know the minimum number of depth levels that a graph must have to implement a constant with a given S.

Theorems 2 and 3 prove the properties of the completely multiplicative graphs, namely, generating the upper limit of non-zero digits mentioned in Theorem 1 with the minimum possible number of R-operations. From them, we have that the completely multiplicative graph is a solution with the lower bound for the number of R-operations. However, as it is known, this graph has articulation points, and every articulation point represents the union between two cascaded subgraphs, i.e., the product of two smaller constants. Therefore, Theorem 4 uses Ω to identify what constants can be implemented with the completely multiplicative graph (for example, prime constants cannot be factorized into smaller constants; thus, they cannot be implemented by a completely multiplicative graph).

Theorem 5 identifies the minimum number of R-operations needed in any non-multiplicative graph with a given number of depth levels, and Theorem 6 proves that non-multiplicative graphs can generate the upper limit of non-zero digits mentioned in Theorem 1 with its minimum number of R-operations. Then, Theorem 7 establish the lower bound for the number of R-operations needed to implement a prime constant (Ω = 1).

Finally, Theorem 8 completes the information of Theorems 4 and 7, namely, the lower bound of R-operations needed to implement non-prime constants that have fewer number of factors than the number of subgraphs used in a completely multiplicative graph.

Theorem 1

A graph with p depth levels can provide at most n p non-zero digits for a constant.


The proof is given by induction (see proof of Theorem 6.9 in [39] for the case of 2-input A-operations):

  1. 1)

    The base case corresponds to the first depth level, where a n-input A-operation can form a constant with at most n non-zero digits. This is true since the input of any graph has one non-zero digit [3, 4, 39].

  2. 2)

    As inductive step, we assume that, in the p-th level, there are n p non-zero digits at most. In the (p + 1)-th level, an A-operation can form a constant whose number of non-zero digits is the sum of the numbers of non-zero digits at every input of that A-operation. This is at most n times the maximum number of non-zero digits available in the previous level, i.e., n × n p = n p + 1 non-zero digits.

Since assuming that the theorem is true for p implies that the theorem is also true for p + 1, and since the base case is also true, the proof is complete. An adder, regardless of its number of inputs, cannot generate more non-zero digits than the sum of the numbers of non-zero digits in every one of its inputs. Thus, the MNSD can be, at most, n-plicate if the inputs of the n-input adder placed in any depth level come from the immediately previous depth level. ■

Theorem 2

A completely multiplicative graph with p A-operations can generate n p non-zero digits.


This proof is an straightforward extension of the proof of Theorem 6.8 in [39], which corresponds to completely multiplicative graphs with 2-input A-operations. As stated earlier, the input of a graph has one non-zero digit. In the completely multiplicative graph, there are at most n non-zero digits after the A-operation placed at the 1st depth level. Cascading an A-operation to that output yields at most n × n non-zero digits, and so on. The number of non-zero digits at the depth level p is at most the n-tuple of the number of non-zero digits of a fundamental at the (p − 1)-th depth level. Consequently, the maximum number of non-zero digits at the p-th depth level is n p. ■

Theorem 3

A completely multiplicative graph with p depth levels needs only p R-operations.


The completely multiplicative graph with p depth levels has p A-operations, and every A-operation forms a subgraph. Pipelining between two subgraphs needs only one register, according to [38], because the pipelining occurs on the articulation point. This results in every A-operation being followed by a register. Since an A-operation followed by a register is considered an R-operation, there are only p R-operations in total. This is illustrated in Fig. 7. ■

Fig. 7
figure 7

The pipelined completely multiplicative graph achieves n p non-zero digits with the minimum number of n-input R-operations, p, and the minimum number of depth levels, p

Theorem 4

A constant with (n p − 1+ 1) ≤ S ≤ n p and Ω ≥ p needs at least p R-operations.


From Theorem 2, we have that a constant with (n p − 1+ 1) ≤ S ≤ n p non-zero digits can be implemented with at least p depth levels, which implies at least p A-operations. From Theorem 3, we have that a completely multiplicative graph can generate those values for S with only p R-operations. The completely multiplicative graph with p R-operations consists of p cascaded subgraphs; thus, a constant implemented with that graph must have at least p prime factors. Since Ω ≥ p holds, the completely multiplicative graph can be employed to implement that constant using p R-operations. ■

Theorem 5

A non-multiplicative graph with p depth levels needs at least (2p − 1) R-operations.


According to Theorem 3, if a graph with p depth levels has only p R-operations in total, it must be a pipelined completely multiplicative graph. According to Theorem 2, that graph can generate the maximum possible number of non-zero digits, namely, n p. To make non-multiplicative that optimal graph, the (p − 1) articulation points must be eliminated. From [38], it is known that at least one additional R-operation must be added for every eliminated articulation point. Therefore, at least (2p − 1) R-operations are required, i.e., the original p minimum number of R-operations in the form of addition-delay pairs plus the additional (p − 1) R-operations in the form of pure delays. Figure 8 shows an example with p = 3. ■

Fig. 8
figure 8

Non-multiplicative graph with p = 3 depth levels and p − 1 extra R-operations in the form of pure delay

Theorem 6

A non-multiplicative graph with p depth levels and (2p − 1) R-operations can generate n p non-zero digits.


Consider a graph with p depth levels formed by two completely multiplicative graphs of (p − 1) levels each, connected in parallel from the input of the graph, and one A-operation placed in the p-th level summing up the outputs of the aforementioned graphs. The output of one of these graphs is connected to the n − 1 inputs of the last A-operation, and the output of the other graph is connected to the remaining input of the last A-operation. This is a non-multiplicative graph because it is not formed by cascading subgraphs, and it is composed by (2p − 1) A-operations. According to Theorem 2, we can obtain n p − 1 non-zero digits from the completely multiplicative graphs, and according to Theorem 3, these graphs can be pipelined without requiring extra registers. Since the last A-operation can add n times the n p − 1 non-zero digits in each one of its inputs and can be pipelined without extra cost, the resulting graph generates n p non-zero digits using (2p − 1) R-operations. An example of this is shown in Fig. 9. ■

Fig. 9
figure 9

Non-multiplicative graph that generates the maximum number of non-zero digits, n p, with the minimum number of R-operations in non-multiplicative graphs

Theorem 7

A constant with (n p − 1+ 1) ≤ S ≤ n p and Ω = 1 needs at least 2p − 1 R-operations.


Since Ω = 1 holds, the non-multiplicative graph must be employed to implement that constant. From Theorem 6, we have that a constant with (n p − 1+ 1) ≤ S ≤ n p non-zero digits can be implemented with at least p depth levels and at least 2p − 1 R-operations. This is a lower bound for the number of R-operations, since from Theorem 5, we have that a non-multiplicative graph with p-levels needs at least 2p − 1 R-operations. ■

Theorem 8

A constant with (n p−1+ 1) ≤ S ≤ n p and 1 < Ω < p needs at least (2p − Ω) R-operations.


From Theorem 1, we have that p depth levels are necessary to achieve the values of S in the specified range. Since Ω < p holds, we can take advantage of a completely multiplicative graph with Ω−1 R-operations at most, which, according to Theorem 2, generates n Ω−1 non-zero digits at most, and represents the product of Ω − 1 factors. The last factor can be formed with a non-multiplicative subgraph with [p − (Ω − 1)] depth levels. According to Theorem 5, this subgraph needs at least 2[p − (Ω − 1)] − 1 R-operations, and according to Theorem 6, it can generate n [p − (Ω − 1)] non-zero digits. The total graph, illustrated in Fig. 10, can generate at most n Ω − 1 × n [p − (Ω − 1)] = n p non-zero digits and uses at least (Ω − 1) + 2[p − (Ω − 1)] − 1 = 2p − 2(Ω − 1) + (Ω − 1 − 1 = 2p − (Ω − 1) − 1 = (2p − Ω) R-operations. ■

Fig. 10
figure 10

Generalized graph that generates the maximum number of non-zero digits, n p, with the minimum number of R-operations in a multiplicative graph for constants with less prime factors than the minimum number of depth levels

Finally, from Theorem 1, we have that the number of depth levels necessary to achieve S is p = ⌈ log n (S)⌉. Substituting this value for p and using Theorems 4, 7, and 8, we obtain the lower bound for the number of R-operations needed to form a PSCM block as follows:

$$ {L}_{SCM}=\left\{\begin{array}{l}2\left\lceil { \log}_n(S)\right\rceil -\varOmega; \kern3.5em \varOmega <\left\lceil { \log}_n(S)\right\rceil, \\ {}\left\lceil { \log}_n(S)\right\rceil; \kern2.25em \varOmega \ge \left\lceil { \log}_n(S)\right\rceil .\kern3.25em \end{array}\right. $$

3.2 PMCM case

The theorems in this section are stated for N constants c 1, c 2, …, c N , whose respective MNSDs are S 1, S 2, …, S N , and their respective numbers of prime factors are Ω1, Ω2, …, Ω N , such that S 1 ≤ S 2 ≤ … ≤ S N .

Theorem 9 indicates the lower bound for the number of n-input A-operations needed to form an MCM block. If pipelining is added, more R-operations than the aforementioned lower bound may be needed because the constants with fewer prime factors may use non-multiplicative graphs, which require extra R-operations (see Theorems 5 to 8). Besides, all the outputs of the PMCM block must have equal number of depth levels to balance the input–output delay, which also may require extra R-operations. Based on these observations, Theorem 10 extends the lower bound provided in Theorem 9 by identifying at least how many extra R-operations would be needed. From these theorems, we obtain the lower bound for the number of R-operations needed to form a PMCM block.

Theorem 9

At least K n-input A-operations are needed to build an MCM block, where K is given by

$$ K=\left\lceil { \log}_n\left({S}_1\right)\right\rceil +{\displaystyle \sum_{i=1}^{N-1} E\left({S}_i,{S}_{i+1}\right)}, $$

with \( E\left({S}_i,{S}_{i+1}\right)=\left\{\begin{array}{c}\hfill 1;\kern5em {S}_i={S}_{i+1},\hfill \\ {}\hfill \left\lceil { \log}_n\frac{S_{i+1}}{S_i}\right\rceil; \kern0.75em {S}_i<{S}_{i+1}.\hfill \end{array}\right. \)


Recall that every A-operation has only one possible configuration and therefore can generate only one fundamental. Simply shifted (i.e., scaled by a power of two) versions of that fundamental can be obtained from that A-operation. Since the target constants are integer and odd by definition, it is not possible to obtain two target constants from the same A-operation. Therefore, there must be at least N n-input A-operations for the N constants. Note that, since the terms S i are sorted in ascendant order, S 1 corresponds to the simplest constant, i.e., the one with the smallest number of non-zero digits. From Theorem 1, we have that with p depth levels we can obtain n p non-zero digits at most. By using the relation n p ≥ S 1, we have that the minimum number of levels necessary to generate S 1 non-zero digits is ⌈ log n (S 1)⌉, which implies the existence of at least ⌈ log n (S 1)⌉ A-operations for that constant. Finally, if S i+1 > n × S i holds, we have that a single A-operation is not able to generate the constant c i+1 if there are only coefficients with at most S i digits available because the number of non-zero digits at the output of an A-operation is at most the sum of the number of non-zero digits at its inputs. Therefore, at least ⌈ log n (S i + 1/S i )⌉ A-operations will be required. This proof is a straightforward extension of the proof given in [3] for the lower bound of 2-input A-operations that form an MCM block. ■

Theorem 10

At least L R-operations are needed to build a PMCM block, where L = K + F + G, with

$$ F=\left\{\begin{array}{c}\hfill {\displaystyle \underset{i}{ \max }}\left\{\left\lceil { \log}_n\left({S}_i\right)\right\rceil -{\varOmega}_i\right\};\kern0.5em \forall\ i\kern0.5em \mathrm{such}\ \mathrm{that}\kern0.75em {\varOmega}_i<\left\lceil { \log}_n\left({S}_i\right)\right\rceil, \hfill \\ {}\hfill 0;\kern8.25em \mathrm{otherwise}.\hfill \end{array}\right. $$
$$ G={\displaystyle \sum_{i=1}^{N-1}\left\lceil { \log}_n\left({S}_N\right)\right\rceil -\left\lceil { \log}_n\left({S}_i\right)\right\rceil } $$

and K given in (7).


Consider that there is a constant c m that satisfies Ω m  < ⌈ log n (S m )⌉ and, if there are more constants that satisfy such condition, c m has the greatest difference [⌈ log n (S m )⌉ − Ω m ]. From Theorem 8, we have that the constant can be formed by cascading a non-multiplicative graph with a completely multiplicative graph, where the non-multiplicative graph needs 2[⌈ log n (S m )⌉ − (Ω m  − 1)] − 1 R-operations. Since Theorem 9 has not taken into consideration the number of prime factors, only [⌈ log n (S m )⌉ − (Ω m  − 1)] A-operations have been accounted in that theorem, under the assumption that the constant c m can be constructed with the optimal completely multiplicative graph. Therefore, at least [⌈ log n (S m )⌉ − (Ω m  − 1)] − 1 extra R-operations must be included when pipelining is applied, which explains the term F. The term G is explained by the fact that extra R-operations may be needed to achieve the same number of pipelined stages from input to output in every constant. Since the minimum depth level of a constant is given by ⌈ log n (S)⌉, the differences between the minimum depth level of the constant c N (which has the greatest depth level among other constants) and the minimum depth levels of the other constants are accumulated in the term G. ■

From Theorem 10, we can express the lower bound for the number of R-operations in the PMCM case as

$$ {L}_{PMCM}=\left\lceil { \log}_n\left({S}_1\right)\right\rceil +{\displaystyle \sum_{i=1}^{N-1}\left(\left\lceil { \log}_n\left({S}_N\right)\right\rceil -\left\lceil { \log}_n\left({S}_i\right)\right\rceil \right)}+{\displaystyle \sum_{i=1}^{N-1} E\left({S}_i,{S}_{i+1}\right)}+ F, $$

with \( E\left({S}_i,{S}_{i+1}\right)=\left\{\begin{array}{c}\hfill 1;\kern5em {S}_i={S}_{i+1},\hfill \\ {}\hfill \left\lceil { \log}_n\frac{S_{i+1}}{S_i}\right\rceil; \kern0.75em {S}_i<{S}_{i+1},\hfill \end{array}\right. \)

and \( F=\left\{\begin{array}{c}\hfill {\displaystyle \underset{i}{ \max }}\left\{\left\lceil { \log}_n\left({S}_i\right)\right\rceil -{\varOmega}_i\right\};\kern0.5em \forall\ i\kern0.5em \mathrm{such}\ \mathrm{that}\kern0.75em {\varOmega}_i<\left\lceil { \log}_n\left({S}_i\right)\right\rceil, \hfill \\ {}\hfill 0;\kern8.25em \mathrm{otherwise}.\hfill \end{array}\right. \)

4 Results and comparisons

In this section, comparisons of the proposed lower bounds with the lower bounds currently available in literature are presented, detailing PSCM and PMCM cases in Subsections 4.1 and 4.2, respectively. In all cases, two and three-input additions were considered.

First, the PSCM case is addressed for n = 2 (i.e., 2-input additions) with an illustration of the lower bounds averaged over all the constants with a word length of B bits, where B goes from 1 to 14. This illustration compares the proposed lower bound with the existing lower bounds from [3] and [4], showing that the proposed lower bound is tighter. An example is also included, where the pipelined shift-and-add multipliers for some constants are constructed with 2-input and 3-input additions.

The effectiveness of the PMCM lower bound is demonstrated by examples, where pipelined shift-and-add multiple constant multiplication blocks are constructed using the algorithms from [7, 8, 22, 30] and [36] for the case of 2-input additions and the algorithm from [10] for the case of 3-input additions. The proposed lower bound is compared with the lower bound from [3] in the case of 2-input additions, and in most of the cases, it provides better estimation of the number of required R-operations. For n = 3 (i.e., 3-input additions), there are no theoretical lower bounds currently available in literature. Thus, the proposed lower bound is only compared with the solution from [10]. In that case, the proposed lower bound falls short only by one R-operation.

4.1 PSCM case

The lower bounds from methods [3] and [4], as well as the proposed lower bound L PSCM from (6) are averaged for all constants with B bits, where B is between 1 and 14. These averages are shown in Fig. 11. We can observe the tightening of the proposed lower bound, i.e., the proposed lower bound in general is greater than the lower bounds currently available in literature. Table 2 presents, for n = 2, the percentage of constants with improved lower bounds among 10,000 14-bits random constants and among 10,000 B-bits random constants, with B between 15 and 32.

Fig. 11
figure 11

Average lower bounds

Table 2 Percentage of constants with improved lower bounds

Example 1 presents the pipelined shift-and-add multipliers for constants {11,467}, {11,093}, and {13,003} constructed with 2-input additions (shown in Fig. 12a, c, and e, respectively) and 3-input additions (shown in Fig. 12b, d and f, respectively). In all the cases, the optimal solutions have the number of R-operations predicted by the proposed lower bound, as shown in Table 3. Besides, for the case of two-input additions, the proposed lower bound outperforms the ones from [3] and [4] because the lower bound from [3] falls short by 2 R-operations and the lower bound from [4] falls short by one R-operation.

Fig. 12
figure 12

a Two-input adder graph of constant 11,467, b three-input adder graph of constant 11,467, c two-input adder graph of constant 11,093, d three-input adder graph of constant 11,093, e two-input adder graph of constant 13,003, and f three-input adder graph of constant 13,003

Table 3 Number of R-operations for Example 1 using n = 2 and n = 3 input adders

Example 1 The constants {11,467}, {11,093}, and {13,003} have similar graph and the same lower bounds as shown in Table 3. The corresponding graphs are presented in Fig. 12.

4.2 PMCM case

In Example 2, the multiplier block with constants {44; 130; 172}, formed with 2-input additions, is presented. In Table 4, the number of R-operations obtained by the algorithms Hcub [30] with pipelining, PAG using ASAP pipelining [8], and the optimal PAG [8] are listed. Additionally, the lower bound of [3] and the proposed lower bound LPMCM from (8) are given. The proposed lower bound is closer to the number of R-operations needed to implement the multiplier block than the lower bound of [3].

Table 4 Resulting R-operations for Example 2 using n = 2 input adders

Example 3 presents the group of constants {3; 13; 21; 37} that form a multiplier block. The R-operations needed to implement the multiplier block using 2-input additions are obtained with the algorithms RAG-n [36] with pipelining, RSG [22], and OFL [7]. The resulting values are shown in Table 5, where it can be observed that the OFL algorithm offers the less number of R-operations. Also, the lower bound of [3] and the proposed lower bound LPMCM from (8) are given in Table 5. In this example, the proposed lower bound estimates the same number of R-operations used by the OFL algorithm to implement the multiplier block.

Table 5 Resulting R-operations for Example 3 using n = 2 input adders

A multiplier block formed with the constants {7,567; 20,406} is illustrated in Example 4. The R-operations needed to implement the multiplier block using 2-input additions are obtained with the algorithm PAG [8]. Table 6 shows the resulting number of R-operations together with the estimated number of R-operations using the lower bound of [3] and the proposed lower bound LPMCM from (8). The R-operations needed to implement the multiplier block using 3-input additions are obtained with the algorithm PAG for 3-input additions [10]. Table 7 shows the resulting number of R-operations along with the estimations using the proposed lower bound LPMCM from (8).

Table 6 Resulting R-operations for Example 4 using n = 2 input adders
Table 7 Resulting R-operations for Example 4 using n = 3 input adders

Finally, Example 5 presents the constants {87,381; 689,493} that form a multiplier block. The R-operations needed to implement the multiplier block using 2-input additions are obtained with the algorithm PAG [8], and the R-operations needed to implement the multiplier block using 3-input additions are obtained with the algorithm PAG for 3-input additions [10]. Table 8 shows the resulting number of R-operations together with the estimated number of R-operations using the lower bound of [3] and the proposed lower bound LPMCM from (8). Table 9 shows the resulting number of R-operations along with the estimations using the proposed lower bound LPMCM from (8). The proposed lower bound presents a reliable estimation of the number of R-operations needed to implement the multiplier block.

Table 8 Resulting R-operations for Example 5 using n = 2 input adders
Table 9 Resulting R-operations for Example 5 using n = 3 input adders

Example 2 (example given in [8]) A multiplier block with the constants from the set {44; 130; 172} have the estimate number of R-operations as shown in Table 4 (the resulting graphs are shown in Fig. 1 of paper [8]).

Example 3 (example given in [7]) A multiplier block with the constants from the set {3; 13; 21; 37} have the estimate number of R-operations as is shown in Table 5 (the resulting graphs can be seen in Fig. 4 of [7]).

Example 4 (example given in [10]) A multiplier block with the constants from the set {7,567; 20,406} have the estimate number of R-operations as shown in Table 6 for two-input adders and Table 7 for three-input adders (Fig. 3 of [10] shows the corresponding graphs).

Example 5 A multiplier block with the constants from the set {87,381; 689,493} have the estimate number of R-operations as shown in Table 8 for 2-input adders and Table 9 for 3-input adders. The corresponding graphs are shown in Fig. 13.

Fig. 13
figure 13

a Two-input adder graph by PAG algorithm for the multiplier block {87,381; 689,493} and b three-input adder graph by PAG algorithm for the multiplier block {87,381; 689,493}

5 Conclusions

New theoretical lower bounds for the number of R-operations in the fully pipelined SCM and the fully pipelined MCM cases for n-input adders/subtractions have been presented. The proposed lower bounds are tighter because pipelining registers were explicitly considered. On the other hand, it was observed that the use of articulation points allows a rapid increase of the number of non-zero digits from a depth level to the next depth level. The new theoretical lower bounds achieve better estimation of the number of required operations needed to implement a single multiplier or a multiplier block. The tightening of the new lower bounds was illustrated with examples in the comparisons section.