1 Introduction

The roaring demand for wireless connectivity at a low price point has, in recent years, spurred the interest for highly-integrated transceiver solutions that are able to cut down on expensive silicon area requirements. In this context, one of the major limiting factors is generally represented by the frequency synthesizer used to generate the local oscillator signal for the transceiver. Conventionally implemented as phase-locked loops (PLLs) based around LC-oscillators, they require large amounts of area due to the use of integrated inductors. Ring-oscillator (RO) based frequency synthesizers, on the other hand, ensure reduced area occupation, provide an inherent immunity to magnetic pulling and are better suited to scaling. However, they also suffer from a worse power vs. phase noise tradeoff with respect to their LC-based counterparts [1], which—leading to an undesirable degradation in the overall transceiver efficiency—prevents a more widespread adoption.

An effective way to overcome this issue, for applications that require very low integrated jitter levels but are not constrained by tight spot-noise requirements (e.g. IEEE 802.11b and high-performance clocking), is to perform an aggressive high-pass filtering of the RO phase noise. In principle, this could be achieved by increasing the bandwidth of the phase-locked loop (PLL) controlling the RO. In practice, however, the PLL bandwidth cannot be increased indefinitely, as it must remain well below the reference frequency to ensure stability [2]. The achievable level of filtering is therefore generally rather limited.

To increase the ring-oscillator phase noise filtering bandwidth beyond the limits set by conventional PLLs, two architectures have been proposed: multiplying delay-locked loops (MDLLs) [3,4,5,6,7,8,9,10] and injection-locked phase-locked loops (IL-PLLs) [11,12,13,14,15,16,17,18,19,20]. Both architectures suppress jitter accumulation by performing a periodic realignment of the ring oscillator edges to a cleaner reference signal edge. Whereas in the IL-PLL case this is achieved by enforcing the crossing times of the output signal through switch transistors—which only allow for partial realignment—MDLLs rely on a multiplexer (MUX) placed within the RO loop to fully substitute a recirculating edge with the cleaner reference one (Fig. 4.1). Since this effectively limits jitter accumulation in the RO to only one reference cycle, MDLLs are able to achieve the highest filtering bandwidth among the two architectures, at about half of the reference frequency, \(f_{ref}/2\) [21]. As a result, they clearly represent the architecture of choice for highly-efficient inductorless frequency synthesizers.

Fig. 4.1
figure 1

MDLL architecture a block diagram and b phase noise filtering capabilities

2 Fractional-N MDLLs

The basic MDLL architecture introduced in the previous section, inherently requires that the output frequency be an integer multiple of the input one, so that precise edge substitution can be achieved. However, to provide a viable alternative to conventional LC-based PLLs, MDLLs should also be able to generate output frequencies that are not an integer multiple of the reference one—a concept commonly referred to as fractional frequency synthesis. Unfortunately, extending the MDLL architecture to fractional-N operation presents some extra challenges.

The conventional approach to enable fractional-N frequency synthesis in MDLLs [8], is illustrated in Fig. 4.2. Similarly to a PLL, the modulus control, MC[k], of the feedback divider is dithered by a \(\Delta \Sigma \) modulator to achieve an average fractional division factor. For the simpler case of a first-order modulator, MC[k] is switched between two levels, N and \(N+1\). This, in turn, leads to a time error between the rising reference and oscillator edges, which follows a ramp from 0 to \(T_v\), where \(T_v\) is the oscillator period. To avoid the spectral degradation resulting from such a large quantization noise being introduced in the RO during edge-replacement, a digital-to-time converter (DTC)—which is an element that allows to introduce a digitally-controlled delay on a signal—is placed on the reference path, to realign the injected edges to the recirculating RO edges. The control signal for the DTC, del[k], is derived by first accumulating the \(\Delta \Sigma \)-quantization error to account for the intrinsic frequency-to-phase integration in the MMD, and then scaling it by a proper gain so as to match the DTC’s bit-to-time conversion gain. As a result, the required DTC range is set by the amplitude of the \(\Delta \Sigma \) quantization error being canceled.

Fig. 4.2
figure 2

Conventional fractional-N MDLL a implementation and b signal diagrams

Fig. 4.3
figure 3

Performance gap in state-of-the-art inductorless frequency synthesizers

Unfortunately, the DTC also degrades the reference signal by introducing both random as well as deterministic jitter, due to component noise and nonlinearities in its bit-to-time characteristic [22], respectively. Whereas in PLLs this poses a limited issue, since reference-path noise is largely suppressed by their narrow loop bandwidth, MDLLs suffer from a severe degradation in the output spectrum as a result of their much larger injection bandwidth. In fact, since the reference signal is used by MDLLs to provide a baseline for the jitter reset in the RO, DTC jitter is transferred to the output as-is. As illustrated by the plot in Fig. 4.3, this additional burden leads to a substantial performance gap between the jitter-power-product figure-of-merit (FoM) of integer-N and fractional-N inductorless frequency synthesizers, which prevents the latter from being adopted in more demanding applications.

3 Jitter-Power Tradeoff Analysis

To overcome the limitations of the conventional fractional-N MDLL architecture and enable low-jitter and low-power operation, it is crucial to gain an in-depth understanding for its fundamental design tradeoffs. For the case of PLLs, [25] provides appropriate guidelines to minimize the jitter-power product, in terms of an optimum loop bandwidth and power partitioning ratio among building blocks. However, since MDLLs rely mainly on edge-replacement to achieve oscillator phase noise filtering, shifting the loop bandwidth would have little effect on the overall output jitter. Therefore, an analytical expression for the jitter-power product should be first derived for the specific case of MDLL, and then analyzed to determine which degrees of freedom are available for the designer to optimize the overall system performance.

To this end, accurate yet simple expressions for the oscillator and reference path jitter contributions can be derived by leveraging the spectral estimates developed in [26] through a time-variant modeling approach. The following assumptions—which hold in almost all practical cases—will be considered for simplicity: (i) the output jitter is white-noise limited, i.e. the contribution of 1/f noise to the overall spectrum is negligible, and (ii) the phase noise filtering effect of the tuning loop is negligible compared to that given by the much larger injection bandwidth.

The output jitter contribution due to the RO can be derived by first approximating the MDLL output phase noise spectrum through a Lorentzian function [21]:

$$\begin{aligned} S_{\phi ,ro}^{(out)}(f) \ = \ \frac{K_{inj}}{1+\left( f/f_{inj}\right) ^2} \end{aligned}$$

where \(K_{inj}\) and \(f_{inj}\) represent the estimates derived in [26] for the low-frequency plateau and equivalent filtering bandwidth of an edge-realigned RO, given by:

$$\begin{aligned} {\begin{matrix} K_{inj} \ &{}= \ \mathcal {L}_{ro}(f_{ref}) \cdot \frac{4\pi ^2}{3} \frac{(N-1)(N-0.5)}{N^2}\\ f_{inj} \ &{}= \ f_{ref} \cdot \frac{\sqrt{1.5}}{\pi } \frac{N}{\sqrt{(N-1)(N-0.5)}} \end{matrix}} \end{aligned}$$

where \(\mathcal {L}_{ro}(f_{ref})\) is the single-sideband-to-carrier ratio of the free-running oscillator, evaluated at the reference frequency, \(f_{ref}\). The corresponding output phase noise variance, \(\sigma ^2_{\phi ,ro}\), can then be derived by symbolic integration of (4.1). Scaling the result to obtain jitter, leads to:

$$\begin{aligned} \sigma ^2_{t,ro} \ = \ \frac{\sigma ^2_{\phi ,ro}}{(2\pi f_{out})^2} \ = \ \mathcal {L}_{ro}(f_{ref}) \cdot \frac{1}{N f_{out}\sqrt{6}} \end{aligned}$$

where the multiplication factor has been assumed to be \(N\gg 1\). To link the jitter contribution to the respective power dissipated in the RO, \(P_{ro}\), the commonly adopted figure-of-merit for oscillators, i.e. \(\text {FoM}_{ro} = 10\log _{10} [\mathcal {L}_{ro}(f_{ref}) \cdot (\nicefrac {f_{ref}}{f_{out}})^2 \cdot (\nicefrac {P_{ro}}{\text {1mW}})]\), can be substituted in the previous expression. This ultimately results in:

$$\begin{aligned} \sigma ^2_{t,ro} \ = \ \frac{10^\frac{\text {FoM}_{ro}}{10}}{P_{ro}} \cdot \frac{N}{f_{out}\sqrt{6}} \end{aligned}$$

Since MDLLs rely on the reference edges to provide a baseline to which the RO edges are periodically reset [26], the output jitter contribution due to the reference path is, instead, transferred from the input as-is, i.e. \(\sigma ^2_{t,ref} = \sigma ^{2}_{t,in}\). To link also this contribution to the corresponding power consumption, an appropriate figure-of-merit can be introduced. Under the assumption that the reference path components (i.e. DTC and buffers) are CMOS-based, their jitter variance can be shown to be proportional to the reference clock frequency and inversely to the dissipated power [23]. This suggests the following figure-of-merit:

$$\begin{aligned} \text {FoM}_{ref} = 10\log _{10} [ (\nicefrac {\sigma ^2_{t,ref}}{\text {1s}^2}) (\nicefrac {\text {1Hz}}{f_{ref}}) ( \nicefrac {P_{ref}}{\text {1mW}}) ] \end{aligned}$$

As a result, the reference path jitter contribution can be expressed as:

$$\begin{aligned} \sigma ^2_{t,ref} \ = \ \frac{10^\frac{\text {FoM}_{ref}}{10}}{ P_{ref}} \cdot \frac{f_{out}}{N} \end{aligned}$$

To derive an expression for overall jitter-power product figure-of-merit (FoM) [25] for MDLLs, (4.4) and (4.6) can be summed and multiplied by the total power consumption, \(P_{ro}+P_{ref}\), leading to:

$$\begin{aligned} 10^\frac{\text {FoM}}{10} \ = \ N(1+R) \cdot \frac{10^\frac{\text {FoM}_{ro}}{10}}{f_{out}\sqrt{6}} \ + \ \frac{1}{N} \left( 1 + \frac{1}{R} \right) \cdot f_{out} \cdot 10^\frac{\text {FoM}_{ref}}{10} \end{aligned}$$

where the ratio between reference path and RO power has been defined as \(R = P_{ref} / P_{ro}\). Given that the reference and RO contributions in (4.7) exhibit opposite dependencies on N and R, it is reasonable to assume that a global minimum for the jitter-power product may indeed exist. To determine its value, the partial derivatives of (4.7) with respect to N and R are taken and set to zero. The resulting system of two equations in two unknowns can be solved for N and R, leading to the following expressions for their optimum values:

$$\begin{aligned} {\left\{ \begin{array}{ll} \ \ N_ {opt} &{}= \ \root 4 \of {6} \cdot f_{out} \cdot 10^ {\ (\text {FoM}_{ref}-\text {FoM}_{ro})/{20}} \\ \ \ R_{opt} &{}= \ 1 \end{array}\right. } \end{aligned}$$

That is, the lowest jitter-power product is obtained when oscillator and reference path power dissipation are balanced, i.e. \(P_{ro} = P_{ref}\), and when an optimum reference frequency (i.e. the multiplication factor, N) is selected. The corresponding expression of the optimum jitter-power-product figure-of-merit can be found by plugging (4.8) into (4.7), which results in:

$$\begin{aligned} \text {FoM}_{opt} \ = \ \frac{1}{2} \left[ \text {FoM}_{ref} + \text {FoM}_{ro} \right] + 4 \ \text {dB} \end{aligned}$$

Since the optimum FoM value in (4.9) is proportional to the sum of the individual RO and reference FoMs, the system efficiency can in principle be further improved by acting on either of those two quantities. In practice, however, the ring-oscillator component is bound by thermodynamic limits to a minimum value of \(-165\) dB [28], which can hardly be improved. The reference path, on the other hand, contains a DTC to operate the MDLL in fractional-N mode, which provides additional degrees of freedom to be leveraged. In fact, the analysis presented in [23] suggest two key guidelines to this regard:

  • CMOS DTCs should be preferred over fully-differential implementations, since their jitter-power performance is remarkably superior in typical application cases;

  • For a given DTC architecture, reducing the required delay-range provides the main and most effective way to decrease jitter and thus improve \(\text {FoM}_{ref}\).

In addition to the jitter-power product, several other DTC design-tradeoffs benefit from a reduction of its range as well. DTC nonlinearity, for example, also depends on the delay-range [29]. Reducing it has therefore a positive impact on linearity and, in turn, on calibration complexity and fractional-spur performance. Furthermore, since the individual delay-cells typically dominate the area required for a given DTC design, reducing the range is also beneficial to the area occupation.

4 DTC Range-Reduction Technique

As outlined in the previous section, reducing DTC range entails several advantages for fractional-N MDLL design. Nevertheless, given that proper edge-synchronization has to be preserved in order not to degrade the output spectrum, achieving any significant range-reduction represents a nontrivial task. To overcome these limitations, Fig. 4.4 introduces a technique that—by acting on both the injection path as well as the tuning loop—allows to achieve a substantial reduction in DTC range, without incurring in any edge-misalignment issues [10].

Fig. 4.4
figure 4

DTC range-reduction technique a schematic and b corresponding signal diagrams

In regard to the injection path, range reduction is achieved as follows. Assuming that the oscillator duty-cycle is 50%,Footnote 1 an opposite polarity edge is available every \(T_v/2\). In principle, since only an alignment to the nearest edge is necessary for the injection to be performed correctly, the DTC range can be reduced to \(T_v/2\). If a specific RO edge then happens to be of opposite polarity with respect to the reference one, correct realignment can still be recovered by leveraging a differential oscillator implementation, and simply swapping the injected signal around. The corresponding signal diagrams, for the simpler case of a first-order \(\Delta \Sigma \)-modulator, are shown in Fig. 4.4b. Conventionally, the delay required from the DTC follows a ramp from 0 to \(T_v\), as a result of the quantization noise amplitude introduced by the \(\Delta \Sigma \)-modulator. By resetting the DTC control word in the second part of the delay-ramp, i.e. after a maximum \(T_v/2\) delay has been reached, the rising reference edges become aligned with falling edges in the oscillator. To match edge polarity, the reference signal is then swapped around according to the value of a control signal, s[k], which is set to 1 during the second part of the ramp.

The s[k] control signal is derived via a successive requantization of the frequency-control word (FCW), as shown in Fig. 4.4a. A multiplication by two (i.e. a shift left) is first performed on the input FCW, so that all bits of the fractional part—except for the MSB—are requantized by the first \(\Delta \Sigma \)-modulator. Its output is then divided by two (i.e. shifted right) to restore the correct fractional information. The resulting signal is then fed to a modulo-2 accumulator—which essentially behaves like a single-bit first-order \(\Delta \Sigma \)-modulator—to complete the requantization of the fractional part, providing a dithered control signal for the integer divider placed in the frequency acquisition loop. The accumulated quantization error from the first \(\Delta \Sigma \) is used as new control signal for the DTC, whereas the sum output of the modulo-2 accumulator finally represents the inversion-control signal, s[k].

In the tuning loop, the DTC-reset method described so far would lead to a square-wave-like time error at the TDC input, with a corresponding amplitude of \(T_v/2\), which is caused by the missing delay introduced via the DTC. To maintain lock even under these conditions, a power-hungry multi-bit TDC would generally be required to track the error. Then, to avoid spurious modulations of the RO, this error would additionally require proper canceling at the TDC output. To overcome these issues and allow for low-power and low-jitter operation, a 1-bit TDC operated in sub-sampling mode is leveraged as follows. Conventionally, 1-bit TDCs are used to detect time-errors in a narrow range around \(\Delta t = 0\), for which they exhibit an equivalent linear gain, \(K_{pd}\) [24]. However, by connecting the TDC in sub-sampling mode—i.e. by allowing it to directly the oscillator signal instead of the divider output—the time error can be detected with respect to all oscillator edges, virtually increasing its range well above \(\Delta t = 0\). In fact, as illustrated by the lower part of Fig. 4.4b, this results in a 1-bit TDC characteristic with a period of \(T_v\) and gain of opposite-sign every \(T_v/2\). Therefore, the deterministic square-wave-like time error just shifts the operating point for noise detection in 1-bit TDC, either around the \(\Delta t = 0\) region or the \(\Delta t = -T_v/2\) one. Since both are able to provide an average linear gain, phase detection is not compromised. To recover the correct time-error sign, the 1-bit TDC output signal, e[k], is then simply inverted according to the value of s[k].

5 Implemented Architecture

Fig. 4.5
figure 5

Block diagram of the implemented MDLL prototype

Figure 4.5 shows the block diagram of the proposed fractional-N MDLL architecture, which has been implemented in a standard 65 nm CMOS process [10]. The system leverages the proposed DTC range-reduction technique and the results from the jitter-power tradeoff analysis, to achieve both low-jitter and low-power fine fractional-N frequency synthesis.

The MDLL is based around a five-stage pseudo-differential ring oscillator, which is tuned via current-starved NMOS transistors [8]. A simple pulser circuit, based on an AND-gate edge detector, identifies the rising edges of the DTC-delayed reference signal to be injected, ref\(_\text {dtc}\), and controls the multiplexer accordingly. A swapping-MUX—i.e. a transmission-gate-based multiplexer with an embedded polarity reverser—is used to selectively swap the polarity of the differential injection signal, whenever \(s[k] = 1\). Since static timing offsets between the injection and tuning paths would lead to reference spurs in the MDLL output spectrum, an automatic time offset compensation is additionally used [8].

Fine frequency tuning is achieved via the previously introduced 1-bit TDC sub-sampling loop. Coarse frequency acquisition is instead achieved by means of a digital frequency-locked loop (FLL), based on a variant of [27]. It relies on a low-power, five-level TDC to sense the coarse timing difference between rising edges of the DTC-delayed reference signal, ref\(_\text {dtc}\), and divider output, div. The TDC output is then fed to a digital loop filter, which provides the coarse tuning information for the RO. Once locking has been achieved, the mid-thread characteristic of the five-level TDC ensures that the FLL enters an automatic dead-zone state (with negligible power consumption), which is only left if a significant phase disturbance is sensed between ref\(_\text {dtc}\) and div. Since the DTC range-reduction technique determines a residual \(T_v/2\) time error between ref\(_\text {dtc}\) and div, false triggering of the 5-level TDC may become an issue in fractional-N mode. To avoid this, the s[k] control signal is also used in the FLL to selectively resample the output of the integer-N divider, with either the rising or the falling edge of the oscillator (out). This effectively introduces a \(T_v/2\) additional delay on the divided signal, which compensates for the reduced DTC range on the reference path.

The DTC is segmented into a coarse- and a fine-resolution stage, both of which are based on a CMOS-implementation in order to improve the overall efficiency. The coarse DTC is implemented as a cascade of buffer cells, with an embedded multiplexer that allows to set the effective length of the delay line. The fine DTC is instead implemented by digitally varying the capacitive load of a CMOS inverter, and thus its delay. Two cross-latched inverters are then additionally used to generate the required pseudo-differential DTC output. The bit-to-time conversion gain of the two DTCs is adjusted in background by a digital calibration block, which also compensates for their nonlinearity and mismatches.

Since ring-oscillators are subject to process, voltage and temperature (PVT) variations that cause their duty-cycle to vary, rising and falling edges which will not be exactly \(T_v/2\) apart. This, in turn, would lead to a misalignment between reference signal and the recirculating edges, every time a polarity reversal is performed by the swapping-MUX. To avoid the resulting degradation in the output spectrum, a least-mean-square (LMS) based duty-cycle corrector (DCC) has also been implemented. The DCC operates in background and provides an output value which, summed to the DTC control word, allows to cancel the timing mismatches between reference and RO edges through the DTC itself.

To minimize the overall jitter-power product, the MDLL multiplication factor has been chosen according to (4.8), and the power budget for the RO and reference-path components has been equalized as closely as possible. Overall, the blocks running at the reference frequency dissipate 1.64 mW at 100 MHz, and introduce about 300 fs RMS jitter, leading to \(\text {FoM}_{ref} = -328\) dB. The RO, instead, dissipates \(860 \ \upmu \)W and exhibits \(-119\) dBc/Hz phase noise at an offset of 10 MHz, leading to \(\text {FoM}_{ro} = -164\) dB. As a result, the optimum value for the multiplication factor is \(N_{opt} = 16\), with a corresponding expected theoretical \(\text {FoM}_{opt} =-242\) dB.Footnote 2

6 Measurement Results

The prototype, whose die micrograph is shown in Fig. 4.6, has been implemented in a standard 65 nm CMOS process. It occupies a total core area of 0.0275 mm\(^2\), with 0.0175 mm\(^2\) reserved for the digital core and 0.01 mm\(^2\) for the analog blocks (excluding the output buffer). The system is capable of fine fractional-N frequency synthesis in the 1.6-to-3.0 GHz range, with a resolution of around 190 Hz. At 1.6 GHz, the synthesizer core dissipates 2.5 mW from a 1.2 V supply.

Figure 4.7 provides the phase noise measurement in both the integer-N and fractional-N modes, as well as the free-running ring-oscillator profile, around 1.6 GHz. The corresponding RMS jitter values (integrated from 30 kHz to 30 MHz) are 334 fs and 397 fs, for the integer-N and the fractional-N case, respectively. At 1 MHz offset from the carrier, the phase noise level is \(-122.37\) dBc/Hz in the fractional-N mode.

Fig. 4.6
figure 6

Die micrograph of the MDLL prototype, implemented in 65 nm CMOS

Fig. 4.7
figure 7

Phase noise spectra and corresponding jitter level, measured in the fractional-N mode, integer-N mode and open-loop ring-oscillator

Table 4.1 provides a summary of the measured performances, as well as a comparison to other state-of-the-art fractional-N inductorless frequency synthesizers. In the fractional-N mode, the synthesizer reaches a jitter-power FoM of \(-244\) dB, achieving an almost 10 dB improvement over previous state-of-the-art, and effectively bridging the gap to integer-N implementations (see previous Fig. 4.3). The corresponding bandwidth-normalized FoM\(_\text {norm}\), which accounts for the limited jitter integration bandwidth in measurements [10], is \(-240\) dB. The 2 dB discrepancy with respect to the theoretical \(-242\) dB prediction derived in Sect. 4.5, is likely due to a residual power imbalance between oscillator and reference path, as well as to the Lorentzian approximation used for the spectra.

Table 4.1 Performance comparison

7 Conclusion

The increasing demand for low-cost wireless solutions, drives the pursuit of frequency synthesizers with very small overall area occupation. In this chapter, the design of a highly compact yet efficient inductorless frequency synthesizer has been presented. Based on a multiplying delay-locked loop architecture, the system achieves both low-jitter and low-power fractional-N operation, by leveraging the results from a system-level jitter-power tradeoff analysis, combined with the introduction of a novel DTC range-reduction technique. The synthesizer, implemented in a standard 65 nm CMOS process, achieves a record jitter-power FoM of \(-244\) dB in the fractional-N mode, in a compact 0.0275 mm\(^2\) core area.