1 Introduction

In mixed-signal applications, low-power and high-performance circuits are crucial. However, traditional CMOS logic is not well-suited for these applications due to the significant switching noise it generates [1]. To address issues related to switching and transient noise, various logic families have been developed over time, specifically designed for low-voltage, low-power mixed-mode circuit designs. Among these, MOS-Current Mode Logic (MCML) and Positive Feedback Source-Coupled Logic (PFSCL), which operate on the current-steering principle, are widely employed in high-speed circuit designs [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]. Recent literature [17,18,19,20,21,22,23,24,25] has reported significant advancements in the analysis, design, and optimization of MCML and PFSCL. These developments underscore the potential of MCML as a promising alternative to conventional logic design techniques, particularly for low-power applications in mixed-signal integrated circuits. MCML is also critical in RF and microwave systems such as 5G networks, radar, and satellite communications, where it facilitates high-frequency signal processing. In high-performance computing, MCML plays a vital role in supporting clock distribution networks and high-speed interconnects. In the aerospace and defense sectors, it is widely employed in satellite communication systems and radar signal processing. Its inherent power efficiency and ability to operate at high speeds also make it well-suited for emerging technologies such as quantum computing and cryogenic electronics [26,27,28,29].

Among differential logic families, MCML is a popular choice due to its low-noise characteristics, reduced power consumption, and improved power-delay product at high frequencies. In MCML, transient currents are minimized or controlled by utilizing reduced voltage swings during switching events. However, despite its advantages in high-performance digital systems, MCML has the drawback of significant static power consumption. The speed of static MCML logic primarily depends on charging and discharging parasitic capacitances during switching, but at lower supply voltages and smaller voltage swings, gate and interconnect power dissipation becomes more dominant, affecting overall performance. The dynamic version of current mode logic (DyCML) [30,31,32,33,34,35,36,37,38,39] offers an enhanced alternative based on charge distribution logic, delivering high performance at lower supply voltages with reduced power consumption. Unlike static CML circuits, DyCML eliminates static current sources, making it suitable for portable and battery-operated devices. As a result, DyCML avoids static power dissipation, and its dynamic power consumption is independent of input combinations, being lower than other differential logic families due to a smaller output voltage swing. In DyCML circuits, parasitic capacitances play a key role in the circuit’s functionality, enabling a pre-charge and evaluation process within a specified clock cycle. Recent research emphasizes the significance of DyCML in achieving low power and high-speed circuit performance, especially in nanometer CMOS technologies [40]– [41].

Although DyCML gates are efficient in terms of delay and power consumption at lower supply voltages, yet they suffer from an erroneous evaluation problem in multistage operations when evaluating any logic function [37]. In the direct cascading of two DyCML gates, unwanted bleeding of charges from pre-charged nodes can occur, leading to incorrect evaluation between two directly cascaded stages within the limited clock evaluation cycle. To address this issue, two cascading mechanisms are suggested in the literature [31, 36,37,38], which use an intermediary circuit in the form of either an inverter or a self-timed buffer. While these techniques prevent erroneous evaluation, they do so at the cost of increased power dissipation and delay due to the use of buffers and inverters. Now, an improved technique in cascading of DyCML gates is proposed in which alternative nMOS and pMOS logic stages are used, which is called the NORA technique [42,43,44]. The NORA technique effectively solves the erroneous evaluation problem, offering greater logic flexibility with fewer transistors required for the same function realization.

The structure of the paper is divided into seven sections. The cascading mechanism in DyCML gate is briefly reviewed in Sect. 2, and the issues in multistage operation of DyCML gate are also discussed. The proposed complementary Dynamic CML inverter along with race-free NORA based technique for cascading in DyCML gate, is presented in Sects. 3 and 4. Section 5 provides a detailed explanation of the design optimization process, employing both Taguchi and ANOVA statistical methods. The functionality, simulation, and performance of the proposed technique, compared with the existing techniques, are presented in Sect. 6. Lastly, the conclusions are presented in Sect. 7.

2 Cascaded dynamic CML

Dynamic Current Mode Logic (DyCML) is a high-speed, high-performance logic style that operates at lower supply voltages and with reduced voltage swings compared to static MOS Current Mode Logic (MCML). Unlike static CML, which uses a static current source, a DyCML gate incorporates a dynamic current source. The fundamental structure of a DyCML gate is illustrated in Fig. 1. In this design, the transistor pair (M1-M2) along with capacitor CS form the dynamic current source Sects [31,32,33,34,35,36,37,38]. The transistor pair (M3-M4) serves as the pull-down network (PDN), while transistors M5 and M6 make up the pre-charge circuit. The operation of the DyCML gate is controlled by the clock (CLK) signal, alternating between pre-charge and evaluation phases.

When the CLK signal is low, the pre-charge phase begins. In this phase, transistors M2, M5, and M6 are activated, while M1 is deactivated. The differential output nodes, Q and Q̅, are pre-charged to a high voltage level (VOH = VDD) via transistors M5 and M6. At the same time, capacitor CS discharges to ground through transistor M2. During this pre-charge phase output is independent of applied differential input A, because transistors M5 and M6 are on and transistor M1 is off. Once the CLK signal goes high, the circuit moves into the evaluation phase, deactivating M5 and M6 and activating M1. During this phase, the differential outputs are evaluated based on the applied differential inputs. Thus, the DyCML gate operates using a pre-charge-evaluate logic, where the output node capacitances are first pre-charged and then evaluated depending on the inputs.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Basic Architecture of DyCML gate [31, 37]

In multistage applications of DyCML gates, cascading two gates presents certain challenges. To illustrate this issue, consider the cascaded DyCML gate depicted in Fig. 2(a) [37]. During the pre-charge phase, both the outputs (Q1 and Q2) are high, regardless of the differential input stage (A). When the evaluation phase begins, Q1 and Q2 initially remain high. If the input A to the first stage of the DyCML gate is high, the output (Q1) decreases to a low level during evaluation. Since both DyCML gates evaluate simultaneously, the second stage output (Q2) also incorrectly drops to a low level, as shown in Fig. 2(b) [37]. This issue arises because the two stages evaluate at the same time in direct cascading, leading to erroneous evaluation.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

(a) Cascading of two DyCML gates (b) Outputs of the two stages of DyCML gate with respect to the common clock (CLK)

The problem of erroneous evaluation can be resolved by either delaying the second stage’s evaluation until the first stage output is achieved or by ensuring low to high transition at the output. The literature suggests two techniques for implementing multi-stage DyCML gates, which are detailed here:

2.1 Insertion of MCML inverter

Inserting an MCML inverter between two cascaded DyCML stages can prevent erroneous evaluations by ensuring a low-to-high transition at the output [37]. While this technique enables the use of DyCML gates in multi-stage applications, it incurs the drawback of static power consumption by the MCML inverter, limiting its overall suitability.

2.2 Insertion of Self-timed buffer

The second method employs a self-timed buffer [37] placed between consecutive stages of DyCML gates. This buffer translates the voltage stored on capacitor CS,1 in the dynamic current source of the first stage into a full-swing output signal (CLK2), which then triggers the evaluation of the next stage, as illustrated in Fig. 3.

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Cascaded DyCML gates with self-timed buffer [37]

During the pre-charge phase of the first stage, when the CLK signal is low, the output load capacitances (CL,1) are pre-charged to a high voltage level (VDD). At the same time, the capacitor CS,1 discharges to ground potential. This ensures the self-timed buffer output remains low, keeping the second stage in its pre-charge phase. During the evaluation phase, when the CLK signal goes high, the capacitor CS,1 in the first stage charges to a high voltage level (VDD-VSWING), resulting in a high voltage output from the self-timed buffer. As a result, the second stage enters its evaluation phase only after the first stage completes its evaluation.

It is clear that in the multistage application of DyCML gate there is an intermittent circuitry is required to realize any logic function correctly which leads to the requirement of large implementation area. Other than this complementary clock operation also puts constraints on the overlap period of CLK and \(\:\stackrel{-}{\text{C}\text{L}\text{K}}\) signal for race-free operation [42,43,44]. Thus these limitations in cascaded DyCML gates restrict its use for any logic function realization. To eliminate this intermittent circuitry a new complementary dynamic CML inverter gate is proposed which can be directly cascaded to the existing N-type dynamic CML inverter gate i.e. a new NORA based cascaded DyCML technique is proposed which possess higher logic flexibility with less number of transistors.

In the next section, a new complementary dynamic CML inverter is proposed to alleviate the problems of above techniques by eliminating the use of intermittent circuitry.

3 Proposed complementary dynamic CML inverter

The basic architecture of an existing DyCML inverter gate is illustrated in Fig. 1. In the existing DyCML inverter gate pull down network consists of NMOS transistor pair (M3-M4), so it can be called as N-type inverter, where input signal is applied which is evaluated according to CLK signal operation. Using the same CLK signal, a new complementary (P-type) dynamic CML inverter gate can be proposed. In the new proposal of the complementary dynamic CML inverter as illustrated in Fig. 4, the PMOS transistor pair (M1-M2) along with capacitor Cs form the dynamic current source section, the PMOS transistor pair (M3-M4) make up the pull down network (PDN) and the PMOS transistor pair (M5-M6) constitute the pre-discharge circuit respectively.

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

Proposed complementary Dynamic CML inverter gate

The fundamental operation of complementary DyCML inverter gate is similar to that of existing DyCML inverter. The circuit operates based on pre-discharge and evaluate logic. Both operations are performed in the same phase of the CLK signal, as in the existing design [37]. When applied CLK signal is low, pre-discharge phase occurs, where PMOS transistors M1, M5, and M6 are activated, while M2 is deactivated. During this phase the differential output nodes are pre-discharged to the low voltage level and pulled down to a potential equal to the threshold value of the PMOS transistor (VOL=|Vt, p|). This pre-discharge operation of the output nodes is done through the PMOS transistor pair M5 and M6. Additionally, the capacitor Cs is charged to the high potential (VDD) via transistor M1 during this pre-discharge phase. During this phase any changes at input in pull down network do not alter the output voltage level as the pre-discharge transistors M5, M6 are ON, while transistor M2 remains OFF. Conversely, during the evaluation phase, when the CLK signal goes high, transistors M1, M5 and M6 are turned OFF, and M2 is turned ON. This may result in a direct current path been created between the output nodes and intermediate capacitors (CL and CS). Now during this evaluation phase differential outputs are determined based on the applied differential inputs. For this gate, one of the output nodes may get charged to high voltage (VOH = |Vt, p| + VSWING) through the transistor M2 while the other remains at the voltage level of |Vt, p|.

Thus the operation of the complementary DyCML inverter gate follows a pre-discharge and evaluate logic where the output nodes capacitances are first pre-discharged to low voltage (|Vt, p|) and then evaluated according to the applied differential inputs. The inverted output waveforms with the applied differential inputs and CLK signal operation for the existing DyCML inverter [31, 37] and the proposed complementary DyCML inverter is illustrated in the Fig. 5.

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

(a) Output waveforms of the existing DyCML inverter and (b) Output waveforms of the proposed complementary DyCML inverter

3.1 Determining the value of capacitor CS to accomplish desired voltage swing

During the evaluation phase, charge is shifted from capacitor CS of the dynamic current source to the load capacitor CL. This charge transfer follows the charge conservation principle to accomplish the desired voltage swing at the output.

$$\:{{(\text{V}}_{\text{D}\text{D}}{\text{C}}_{\text{S}}+{\text{C}}_{\text{O}\text{U}\text{T}}\left|{\text{V}}_{\text{t},\text{p}}\right|)=(\text{C}}_{\text{o}\text{u}\text{t}}+\:{\text{C}}_{\text{S}}\left)\:\right({\text{V}}_{\text{S}\text{W}\text{I}\text{N}\text{G}}+\left|{\text{V}}_{\text{t},\text{p}}\right|)$$
(1)

Where COUT represents the overall load capacitance at the output node, which includes both parasitic capacitances and the external load capacitance CL. Therefore, the value of capacitor CS required to accomplish the desired voltage swing at the output node can be calculated as follows:

$$\:\:\:{\text{C}}_{\text{S}}=\:\frac{{\text{C}}_{\text{o}\text{u}\text{t}\:}{\text{V}}_{\text{S}\text{W}\text{I}\text{N}\text{G}}}{({\text{V}}_{\text{D}\text{D}}-{\text{V}}_{\text{S}\text{W}\text{I}\text{N}\text{G}}-|{\text{V}}_{\text{t},\text{p}}\left|\right)}$$
(2)

In practice, capacitor CS is designed using MOSFET, for this source and drain terminals of MOSFET are tied together. Thus, CS is replaced by \(\:{\text{W}}_{{\text{C}}_{\text{S}}}\times\:\) \(\:{\text{L}}_{{\text{C}}_{\text{S}}}\times\:\) COX. For the given value of length \(\:{\text{L}}_{{\text{C}}_{\text{S}}}\) and the width \(\:{\text{W}}_{{\text{C}}_{\text{S}}}\) of the MOSFET is computed as follows:

$$\:{\text{W}}_{{\text{C}}_{\text{S}}}{\text{L}}_{{\text{C}}_{\text{S}}}=\:\frac{{\text{C}}_{\text{o}\text{u}\text{t}\:}{\text{V}}_{\text{S}\text{W}\text{I}\text{N}\text{G}}}{\text{C}\text{o}\text{x}({\text{V}}_{\text{D}\text{D}}-{\text{V}}_{\text{S}\text{W}\text{I}\text{N}\text{G}}-|{\text{V}}_{\text{t},\text{p}}\left|\right)}$$
(3)

Where \(\:{\text{W}}_{{\text{C}}_{\text{S}}}\) and \(\:{\text{L}}_{{\text{C}}_{\text{S}}}\) represent the width and length of the MOS transistor for CS, respectively, and COX is the gate oxide capacitance per unit area.

3.2 Power dissipation of the complementary DyCML inverter gate

In the complementary dynamic CML circuit, there is no direct connection between the power supply and ground. A clock signal is employed to ensure that transistors M1 and M2 are not activated simultaneously, thereby minimizing static power consumption. However, because of the load capacitor, the circuit still consumes dynamic power. Consequently, the dynamic power consumption can be described by:

$$\text{P}_\text{dyn}= \alpha\text{C}_\text{OUT}\text{V}_\text{DD}\text{V}_\text{SWING}\text{f}_\text{CLK}$$
(4)

Where fCLK denotes the frequency of the CLK signal and α represents the circuit’s switching activity.

4 Proposed NORA based cascaded dynamic CML

A new NORA logic based technique for cascading of DyCML gates is proposed which includes two complementary dynamic logic blocks (N-type and P-type) as depicted in Fig. 6. In this technique self-timed buffer (STB) is not required. The figure shows two dynamic CML gates: the first gate uses pull down network (PDN) for the implementation of logic function and the second gate uses pull up network (PUN) for the same logic function realization.

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

Proposed NORA based Cascaded DyCML gates

In this design, the pre-charge and evaluation timing for N-type DyCML stage is controlled by the clock signal (CLK), while the pre-discharge and evaluation timing for P-type DyCML (Complementary DyCML) stage is also managed by the clock signal (CLK). The NORA based cascaded dynamic CML circuit functions as follows: When the clock signal is low, the output nodes of N-type DyCML stages are pre-charged to VDD through PMOS pre-charge transistors, and the output nodes of P-type DyCML stages are pre-discharged to VOL (|Vt, p|) via PMOS discharge transistors controlled by CLK. Upon transitioning from low to high, the clock signal initiates the sequential evaluation of all cascaded N-type and P-type logic stages.

In this NORA based cascaded circuit the differential inputs are applied to transistors M3,1 and M4,1 of first dynamic CML gate. When the clock signal (CLK) is low, the pre-charge transistors M5,1 and M6,1 of the first gate turn on and pre-charges the node Q1 and \(\:\stackrel{-}{\text{Q}1}\) to supply voltage (VDD), whereas the evaluate transistor M1,1 turns off, avoiding any charge flow through capacitor CS,1. At the same time the pre-discharge PMOS transistors M5,2 and M6,2 of the second gate turn on by discharging any charge present on outputs node Q2 and \(\:\stackrel{-}{\text{Q}2}\) to VOL (|Vt, p|) while transistor M1,2 turns on and charges capacitor CS,2 to supply voltage (VDD).

Conversely, when the clock signal (CLK) is high, the evaluate transistor M1,1 of the first gate is activated while transistor M2,1 is deactivated. Considering the input signal A to be high, at this time transistor M3,1 turns on, bringing output node Q1 to low voltage(VDD - VSWING) and transistor M4,1 remains in off stage keeping node \(\:\stackrel{-}{\text{Q}1}\) at high voltage (VDD). These two outputs are applied as inputs to the second stage of dynamic CML gate. During high phase of clock, PMOS transistor M2,2 turns on and according to the input conditions on transistors M3,2 and M4,2, charging of the node Q2 or \(\:\stackrel{-}{\text{Q}2}\) is done. Here in this case as Q1 is at low voltage, the PMOS transistor M3,2 turn on by charging output node Q2 to a high potential (VOH=|Vt, p| + VSWING) and the other output node \(\:\stackrel{-}{\text{Q}2}\) remains at the low voltage level of |Vt, p|.

5 Optimization of the proposed NORA based technique using Taguchi design of experiments and analysis of variance (ANOVA) statistical techniques

Taguchi Design of Experiments (DoE) and Analysis of Variance (ANOVA) are two powerful statistical techniques widely used for optimization in engineering, manufacturing, and research processes. Taguchi DoE, developed by Genichi Taguchi, focuses on improving product quality and performance by minimizing the impact of uncontrollable factors, often referred to as noise factors. It uses a systematic and structured approach to design experiments with a reduced number of trials while still capturing essential information about the process or system under study. Taguchi’s method uses orthogonal arrays (OA) to plan experiments strategically and efficiently vary multiple factors at different levels. This not only facilitates a clear understanding of how each parameter affects the desired output response but also helps conserve time and resources by means of a more efficient experimental process.

The Analysis of Variance (ANOVA) technique is commonly used in conjunction with the Taguchi DoE method. It provides a quantitative measure, known as the F-ratio, to determine whether changes in input factor levels have a statistically significant effect on the output variable. ANOVA also allows for the calculation of the percentage contribution of each input factor and their interactions to the overall variation in the results. This information helps researchers prioritize control factors for process optimization. Input factors with higher contribution percentages are considered more influential and are therefore critical for achieving optimal system performance.

The combination of Taguchi DoE for experimental planning and ANOVA for result analysis provides a robust framework for identifying the most favorable process parameters, understanding their impact, and determining the optimal levels to improve product quality and performance while minimizing variability.

In this research, three important independent variables were chosen for study: the width of the NMOS transistors (Wn), the width of the PMOS transistors (Wp), and the supply voltage (VDD). To thoroughly assess their impact on the circuit’s performance, each control parameter was evaluated at three distinct levels. The details of these parameters and their corresponding levels are summarized in Table 1. To efficiently design the experiments, an L9 orthogonal array was used, resulting in a total of nine experimental runs, as presented in Table 2. This structured approach not only simplifies the experimental process but also helps in determining the optimal combination of design parameters for improving circuit performance.

Table 1 Control factors and their related levels

The Taguchi Design of Experiments (DoE) approach assesses system performance using the Signal-to-Noise Ratio (SNR) as a key evaluation metric. It is essential to highlight that the concept of SNR applied in this context differs from the conventional SNR commonly used in analog circuit design. In this study, the SNR values were computed for each experimental trial and for each output parameter under consideration. The selection of the appropriate SNR formula depends on the desired output characteristic, whether it is aimed at minimization or maximization. Specifically, the formulas for calculating SNR under the “smaller-the-better” and “larger-the-better” conditions are provided in Eqs. (5) and (6), respectively.

$$\:\text{S}\text{N}\text{R}\:=\:-10\times\:\text{l}\text{o}\text{g}10\:(\frac{1}{\text{n}}\sum\:_{\text{i}=1}^{\text{n}}{\text{Q}}_{\text{i}}^{2})$$
(5)
$$\:\text{S}\text{N}\text{R}\:=\:-10\times\:\text{l}\text{o}\text{g}10\:(\frac{1}{\text{n}}\sum\:_{\text{i}=1}^{\text{n}}\frac{1}{{\text{Q}}_{\text{i}}^{2}})$$
(6)

Where, Qi denotes the output value for the i-th experiment, while n represents the total number of experimental trials conducted. The corresponding Signal-to-Noise Ratio (SNR) values for every output have been graphically illustrated in Fig. 7(a-c).

In this study, the response with the highest Signal-to-Noise Ratio (SNR) is considered to have the most significant influence, while a lower SNR indicates a lesser impact. Three critical output parameters have been selected to determine the optimal design configuration: power dissipation, delay, and power-delay product (PDP). To achieve a balanced and efficient design, the objective is to minimize power consumption, delay, and PDP. Statistical analysis and SNR calculations have been performed using Minitab software. A summary of the output results, along with their corresponding SNR values, is presented in Table 2. Among all trials, Experiment 9 recorded the lowest delay, highest power consumption, and the lowest PDP value. This experiment exhibited a relatively low signal-to-noise ratio (SNR) for delay, but a high SNR for both power dissipation and power-delay product (PDP).

Table 2 Output results along with their corresponding SNR values for various performance parameters for proposed NORA based technique
Fig. 7
Fig. 7The alternative text for this image may have been generated using AI.
Full size image

Plots for SNR ratio of different parameters for the proposed NORA based Technique (a) Delay (b) Power Dissipation (c) PDP. (a) Delay, (b) Power dissipation, (c) PDP

In Table 3, ‘Delta’ represents the change in each performance metric resulting from variations in the corresponding control parameter. A higher Delta value signifies a greater impact on circuit behaviour. By analyzing the results shown in Fig. 7(a-c) alongside the data in Table 3, it is evident that VDD has a stronger influence on delay and power consumption while Wn has a more significant impact on the PDP in the NORA-based cascaded dynamic CML circuit.

Table 3 presents the average Signal-to-Noise Ratio (SNR) values for each of the output parameters delay, power consumption, and PDP calculated across three levels for each control variable: VDD, Wp, and Wn. For each control factor, the level that yields the highest average SNR is considered the most optimal. Based on this analysis, the optimal levels identified are level 3 for VDD, level 2 for Wp, and level 1 for Wn. These correspond to specific values of 1 V for VDD, 0.9 μm for Wp, and 1.8 μm for Wn, indicating the combination most favourable for enhancing overall circuit performance.

Table 3 SNR ratios response table for various parameters

To identify which independent variable or control factor most significantly affects a particular output parameter of the proposed NORA based circuit, an Analysis of Variance (ANOVA) is conducted. The resulting P-value from this statistical test serves as an indicator of the factor’s influence. Among the control variables, the one with the lowest P-value is considered to have the most substantial effect on the corresponding output parameter.

Table 4 presents the models derived through ANOVA for each of the output performance metrics. The results of the analysis reveal that VDD exhibits the lowest P-value among the control factors for delay and power dissipation, while Wn shows the lowest P-value for PDP. Furthermore, Eqs. (7)–(9), developed based on the ANOVA results, describe the mathematical relationship between the input factors and their corresponding output parameters, providing a predictive framework for design optimization.

$${\text{Delay}}\,=\,{\text{154}}.{\text{29}}--{\text{21}}.{\text{49 }}{{\text{V}}_{{\text{DD}}}}--{\text{4}}.{\text{75 Wp}}\,+\,{\text{1}}.0{\text{5 Wn}}$$
(7)
$$\begin{aligned}{\text{Power dissipation}} =\,{\text{4}}.{\text{759}}\,+\,0.{\text{7423 }}{{\text{V}}_{{\text{DD}}}}\\ +{\text{ }}0.{\text{1352 Wp}}\,+\,0.0{\text{195 Wn}}\end{aligned}$$
(8)
$$\begin{aligned}{\text{Power}} - {\text{delay}} - {\text{product }}\left( {{\text{PDP}}} \right) =\,0.{\text{7536}}-0.0{\text{1}}0{\text{5 }}{{\text{V}}_{{\text{DD}}}}\\ \quad -0.00{\text{62 Wp}}\,+\,0.0{\text{1}}0{\text{9 Wn}}\end{aligned}$$
(9)
Table 4 Showing P-value from ANOVA for different control factors

To achieve optimal performance with respect to power consumption, delay, and PDP, the proposed NORA based design was refined using the ANOVA-based optimization approach. The predicted outcomes generated through this method are mentioned in Table 5. The optimized values of VDD = 1 V, Wp = 0.9 μm, Wn = 1.8 μm obtained through the Taguchi DoE, were implemented in the design and simulation of the proposed NORA circuit to evaluate the effectiveness and reliability of the Taguchi DoE and ANOVA techniques. Simulation results demonstrate that the proposed design achieves a delay of 121.8 ps, power consumption of 6.11 µW, and a PDP of 0.744 fJ.

Table 5 presents a comparison between the simulated outputs of the proposed circuit and the predicted results derived using the Taguchi DoE and ANOVA optimization techniques. The close alignment between the simulated and predicted results confirms the accuracy and reliability of the statistical approach.

Table 5 Showing comparison between the simulated results of the proposed NORA based circuit and the predicted values obtained from statistical optimization techniques

6 Simulation results

In this simulation section, both the existing technique and the proposed NORA-based technique for cascaded DyCML gates are analyzed using 45 nm CMOS technology node with a supply voltage of 1 V. The existing and proposed NORA-based technique for cascaded DyCML gates are evaluated with respect to various performance parameters including delay, power dissipation, and power-delay product. To ensure a fair and consistent comparison, all DyCML gates are configured to operate with a 500mV voltage swing. The output waveforms for the existing cascaded DyCML gates with a self-timed buffer (STB) [31, 37] and the proposed NORA-based technique are illustrated in Fig. 8.

Fig. 8
Fig. 8The alternative text for this image may have been generated using AI.
Full size image

(a) Output waveforms of cascaded DyCML gates with self-timed buffer [31, 37] (b) Output waveforms of proposed NORA-based technique in cascaded DyCML gates

6.1 PVT analysis

A PVT (Process, Voltage, and Temperature) analysis of the proposed NORA-based technique for cascaded DyCML gates has been conducted to determine variations in transient characteristics. The supply voltage is adjusted from 1 V to 1.4 V, and the temperature ranged from 0 °C to 80 °C. Measurements are taken for various output parameters, including propagation delay, power dissipation, and power-delay product. A summary of the PVT analysis for the existing cascaded DyCML gates with self-timed buffer (STB) [31] and proposed NORA-based technique is presented in Table 6. In the process variation for the proposed NORA-based technique, the best-case propagation delay of 97.31ps occurs at the FF corner with a peak power consumption of 6.467µW. In contrast, the worst-case propagation delay of 156.2ps is observed at the SS corner, where the power consumption is at its lowest, 5.703µW.

Table 6 PVT analysis of existing cascaded DyCML gates with self-timed buffer [31] and proposed NORA-based technique for cascaded DyCML gates

Figure 9 depicts the process corner analysis of cascaded DyCML gate designs under FF, FS, SF, TT, and SS conditions. The results confirm that the proposed NORA-based technique ensures reliable operation across all corners. Figure 10 shows the effect of supply voltage variation: delay increases as voltage decreases (Fig. 10a), while dynamic power decreases (Fig. 10b). The corresponding change in power-delay product is shown in Fig. 10c. Figure 11a and c depict the effects of temperature on delay, power dissipation, and power-delay product, respectively.

Fig. 9
Fig. 9The alternative text for this image may have been generated using AI.
Full size image

Process Corners results of Cascaded DyCML gates (a) Propagation Delay (b) Power Dissipation

(c) Power-delay product.

Fig. 10
Fig. 10The alternative text for this image may have been generated using AI.
Full size image

Performance variation in parameters with respect to supply voltage (a) Propagation Delay (b) Power Dissipation (c) Power-delay product

Fig. 11
Fig. 11The alternative text for this image may have been generated using AI.
Full size image

Performance variation in parameters with respect to temperature (a) Propagation Delay (b) Power Dissipation (c) Power-delay product

6.2 Monte Carlo analysis

A Monte Carlo simulation with 500 number of samples has been performed to assess the variation of our proposed NORA-based technique for cascaded DyCML gates under a 3σ process. The propagation delay ranges from 108.717ps to 135.597ps, power consumption varies from 6.07278µW to 6.15184µW, and power-delay product extends from 664.1aJ to 829.234aJ. The deviations from the mean values for propagation delay, power consumption, and power-delay product are 3.6%, 0.21%, and 3.6%, respectively, indicating the robustness of the proposed NORA-based technique. The proposed circuit achieves optimal values compared to existing technique as depicted in Table 7. Figure 12a and b, and 12c illustrate the histograms for propagation delay, power consumption, and power-delay product, respectively.

Table 7 Monte Carlo simulations of cascaded DyCML techniques
Fig. 12
Fig. 12The alternative text for this image may have been generated using AI.
Full size image

Monte Carlo Analysis for the proposed NORA technique (a) Propagation delay (b) Power dissipation (c) Power-delay product

6.3 Post layout simulations

To examine the impact of parasitics on the proposed circuit, its layout has been designed. Figure 13 illustrates the layout of the proposed circuit, covering an area of 35.76 μm². These simulations enable a direct comparison between the post-layout and schematic-level results, ensuring that parasitic effects are accurately accounted for. As summarized in Table 8, the post-layout results show that key performance metrics such as propagation delay, power dissipation, and power-delay product remain almost unchanged compared to the pre-layout simulations. This consistency between pre-layout and post-layout results highlights the robustness of the proposed NORA-based cascaded dynamic DyCML gates, demonstrating minimal impact from parasitics on circuit performance.

Table 8 Pre and post layout simulation results of proposed technique
Fig. 13
Fig. 13The alternative text for this image may have been generated using AI.
Full size image

Layout of proposed NORA-based technique

6.4 Application example: realization of 4 × 1 multiplexer using NORA based technique

To illustrate the effectiveness of the proposed NORA-based technique in multistage applications, a 4 × 1 multiplexer has been implemented as a benchmark example. Figure 14 depicts an example of cascading in realisation of a 4 × 1 MUX utilizing a self-timed buffer (STB) [45], where the self-timed buffer connects the first-stage 2 × 1 MUX to the second-stage 2 × 1 MUX [46]. In dynamic CML designs, direct cascading of gates is not possible, necessitating the insertion of self-timed buffers between stages [31, 37]– [38]. However, Fig. 15 shows the implementation of the 4 × 1 MUX using the proposed NORA-based style, which eliminates the need for self-timed buffer. These results in reduced delay, power consumption, and area associated with the self-timed buffer. In the proposed 4 × 1 MUX design, the first-stage 2 × 1 MUX is based on the DyPFSCL architecture as described in the literature [45]– [46], and the same design approach is applicable to its differential counterpart, DyCML. The second-stage 2 × 1 MUX, however, is implemented using the proposed complementary dynamic CML gate.

Fig. 14
Fig. 14The alternative text for this image may have been generated using AI.
Full size image

Block diagram of 4 × 1 MUX using cascaded DyCML with self-timed buffer [37, 38, 45]

Fig. 15
Fig. 15The alternative text for this image may have been generated using AI.
Full size image

Block diagram of 4 × 1 MUX using proposed NORA based technique

All circuits are evaluated under the same simulation conditions as previously described, and the corresponding performance parameters are summarized in Sect. 6.5.2. These results clearly indicate that the 4 × 1 multiplexer implemented using the proposed scheme outperforms its counterparts designed with other logic styles. Table 9 presents the summary of the PVT analysis for the 4 × 1MUX designed using the proposed NORA-based technique. A Monte Carlo simulation with 500 samples was carried out to evaluate the performance variations of the 4 × 1 MUX under a 3σ process variation. Figure 16a and b, and 16c show the histograms for propagation delay, power consumption, and power-delay product, respectively. The deviations from the mean values for propagation delay, power consumption, and power-delay product are 10.9%, 0.85%, and 10.7%, respectively, indicating the robustness of the 4 × 1 MUX using proposed technique.

Table 9 PVT analysis of the 4 × 1 MUX using proposed NORA-based technique
Fig. 16
Fig. 16The alternative text for this image may have been generated using AI.
Full size image

Monte Carlo Analysis (a) Propagation delay (b) Power dissipation (c) Power-delay product

6.5 Performance comparison summary

To comprehensively evaluate the effectiveness of the proposed NORA-based DyCML technique, a detailed performance comparison has been conducted with existing DyCML circuit implementation available in the literature. The comparison aims to demonstrate the improvements achieved in terms of key performance parameters such as propagation delay, power dissipation, power-delay product (PDP), and implementation area. Both the cascaded DyCML gate configurations and complex logic realizations (such as the 4 × 1 multiplexer) have been analyzed under identical simulation conditions using 45 nm CMOS technology.

The comparative analysis highlights the advantages of the proposed design, emphasizing its capability to achieve high-speed and low-power operation with a reduced number of transistors, thereby improving circuit compactness and energy efficiency. The following subsections summarize the results for the cascaded DyCML gates and the 4 × 1 MUX implementations, respectively.

6.5.1 Performance comparison: cascaded DyCML gates

The proposed NORA-based technique for cascaded DyCML gates has been compared to the existing cascaded DyCML technique with a self-timed buffer (STB) [31]. Table 10 shows that the proposed technique achieves reduced delay, power dissipation and power-delay product compared to the existing method. Table 10 also demonstrates that the proposed technique effectively reduces both transistor count and implementation area. The proposed NORA-based technique achieves reductions of 69.55% in delay, 17.84% in power consumption, 74.97% in power-delay product and 27.90% in area when compared to the existing technique.

Table 10 Performance comparison of existing cascaded DyCML with self-timed buffer [31] and proposed NORA based technique

6.5.2 Performance comparison: 4 × 1 multiplexers using various cascaded techniques

To further validate the scalability and performance of the proposed NORA-based DyCML technique, a 4 × 1 multiplexer (MUX) was implemented and analyzed alongside other established logic styles under identical simulation conditions. The comparative results, presented in Table 11, include parameters such as transistor count, propagation delay, power dissipation, and power-delay product (PDP).

The results clearly indicate that the proposed NORA-based 4 × 1 MUX outperforms all other counterparts, including static CMOS, dynamic CMOS, static MCML, and cascaded DyCML with a self-timed buffer. The proposed design achieves reductions of 39.61% in delay, 19.27% in power dissipation, and 51.52% in PDP compared to the self-timed buffer-based DyCML MUX. These improvements are attributed to the elimination of intermediate buffering, reduced transistor count and efficient complementary clocking inherent to the NORA-based DyCML structure.

Overall, the proposed design demonstrates superior energy efficiency and high-speed performance, establishing its suitability for low-power, high-performance mixed-signal applications.

Table 11 Performance comparison of the 4 × 1 MUX using other cascaded technique available in the literature with the proposed NORA based technique

7 Conclusion

The proposed race-free NORA-based DyCML technique provides significant improvements for multistage logic applications by eliminating the need for static MCML inverters or self-timed buffers between dynamic CML stages. Through the alternate use of N-type and P-type DyCML gates, the design enables direct cascading, leading to significant reductions in delay, power dissipation, power-delay product (PDP), and implementation area by 69.55%, 17.85%, 74.97%, and 27.90%, respectively, compared to existing techniques. Performance optimization using Taguchi Design of Experiments (DoE) and Analysis of Variance (ANOVA) ensures efficient transistor sizing and design tuning. To comprehensively validate robustness and real-world applicability, detailed post-layout simulations were conducted alongside process, voltage, and temperature (PVT) variation analysis and extensive Monte Carlo simulations. These analyses confirmed minimal performance deviations under varying conditions, demonstrating high reliability and resilience to mismatch and environmental fluctuations. Overall, the proposed NORA-based DyCML design, with its reduced hardware complexity and robust operation, offers a practical and efficient solution for low-power, high-speed pipelined circuit implementations in advanced CMOS technologies.