1 Introduction

System-on-chips (SoCs) employed in biomedical implants or autonomous Internet-of-Things sensor nodes have to operate from days to years with one battery charge, which puts tight requirements on the integrated components [1]. The information processing of slow biomedical or environmental signals usually takes place in the range of a single to few thousand samples per second. Stable low-frequency clock sources and always-on timers operating within a power budget of only few nanowatts are strongly desired [2]. At the same time, fast system start-up shortens the expensive active time of wireless sensor nodes, which are called remotely out of a sleep state [1].

Conventional discrete crystal oscillators offer pure spectral properties, but modern highly-integrated systems with rigid price and size constraints need alternatives. Hence, integrated clock sources, which come cheap, CMOS-compatible, and energy-efficient, become increasingly popular. Ring oscillators, RC oscillators, and relaxation oscillators (ROSCs) are the most common types, but suffer from increased inaccuracies such as clock jitter and sensitivity to process, voltage and temperature (PVT) variations [3].

In the last decade, substantial improvements to the voltage-mode ROSC topology have been presented: A decision element (DE), i. e. a comparator, compares a reference voltage with the voltage across an integration capacitor which is charged by a current source or via a resistor. Modern ROSC designs are an appropriate choice for low-power SoCs because of superior PVT tolerance, 50% duty cycle, acceptable jitter, and excellent power-efficiency [4, 5]. On the other hand, biomedical and IoT applications need timers and real-time clock generators with low oscillation frequencies, which brings up two major challenges:

  1. (I)

    The relaxation principle results in voltage slewing at the input of the DE [1]. Fast high-gain comparators are desired for low shunt currents and low timing error, but require increased power consumption. The trade-off between shunt currents and comparator speed, power and area becomes a burden for oscillation frequencies below several kilohertz. As an outlook, long-time delay circuits utilize CMOS-compatible thyristor structures formed of gate-coupled PMOS and NMOS, which effectively reduce shunt currents [6]. Large delay times can be achieved with very low switching power [7]. Several thyristors have already been combined to low-power ring oscillators [8, 9], raising interest for employment in ROSCs.

  2. (II)

    Varying temperature affects the oscillator’s accuracy during operation. While supply- and process-dependent frequency deviations can be handled by regulated supplies and additional trimming circuits adjusted during production test, temperature stability requires on-chip compensation circuitry. Digital feedback loops and look-up tables can support the circuit at the expense of increased system complexity [10]. Analog methods such as stabilized current references can reduce the temperature coefficient (TC) substantially, but the loop delay variation associated to comparators and logic remains [3]. Tsubaki et al. [11] propose an additional reference voltage generation cycle, which cancels out the delay variation differentially. Another compensation method is the employment of a current reference with temperature behavior matched to the loop TC [12]. A complementary-to-absolute temperature (CTAT) current reference shows quite effective, but the additional analog source component increases total area and power of the oscillator circuit [13].

In this paper, which is an extension of work originally presented in 2021 IEEE Nordic Circuits and Systems Conference (NorCAS) [14], we present a voltage-mode ROSC with three novel contributions: First, we propose CMOS thyristor circuits as power- and area-efficient DEs for critical voltage slewing. Second, a matched CTAT current reference handles the thyristor temperature variation. An increased negative TC is generated with low area and power by reusing one \(V_{\textrm{BE}}\) of the positive-to-absolute temperature (PTAT) generator typically existing in SoCs. Finally, the presented ROSC is especially aimed for heavily duty-cycled low-power SoCs in IoT and biomedical applications. In order to enable precise wake-up of entire systems, we propose a modified current source start-up circuit. Compared to the original work from [14], we provide additional details on the thyristor DEs operation and investigate the sources of non-ideal temperature behavior. We review the compensation and calibration mechanisms including TC enhancement and fast start-up circuit. Additional detailed analysis and measurement results are presented.

This paper is organized as follows. In Sect. 2, the conventional ROSC architecture is reviewed. The proposed circuit is introduced in Sect. 3 and implementation details of thyristor DEs, current source with \(V_{\textrm{BE}}\)-reuse, and TC compensation are discussed in Sect. 4. Section 5 shows experimental results and Sect. 6 presents conclusions.

2 Architecture of conventional relaxation oscillator

The simplified block diagram of a conventional ROSC suitable for low-frequency clock generation is shown in Fig. 1. Voltage-mode ROSCs are composed of one or two identical integration capacitors \(C_{\textrm{0}}\) and \(C_{\textrm{1}}\), voltage reference \(V_{\textrm{ref}}\) and DEs, which are typically realized as dynamic CMOS comparators. The differential output clock with frequency \(f_{\textrm{clk}}=1/t_{\textrm{clk}}\) is the difference of switch signals \(\phi _0\) and \(\phi _1\) produced by a digital RS latch. The charging current can be provided from a resistor forming an RC element, but low-power and low-speed designs require \(I_{\textrm{osc}}\) in the range of several nanoamperes. In order to prevent excessive resistor noise and area, a combination of capacitors and current reference \(I_{\textrm{osc}}\) is the preferred solution [2].

Fig. 1
figure 1

Simplified block diagram of conventional relaxation oscillator

2.1 Conventional oscillation mechanism

The waveforms in a conventional ROSC are shown in Fig. 2. The path of \(I_{\textrm{osc}}\) is changed periodically with \(t_{\textrm{clk}}/2\), which is controlled by alternating \(\phi _0\) and \(\phi _1\). The operation principle is explained assuming \(\phi _0=1\) and \(\phi _1=0\): The current source charges \(C_{\textrm{0}}\), while \(C_{\textrm{1}}\) is discharged. The voltage \(V_{\textrm{C0}}\) at the positive comparator terminal is determined by charge equality and increases linearly with slew rate \(\textrm{d}V_{\textrm{C0}}/\textrm{d}t\),

$$\begin{aligned} V_{\textrm{C0}}C_{\textrm{0}}&=\frac{I_{\textrm{osc}} t_{\textrm{clk}}}{2} \end{aligned}$$
(1)
$$\begin{aligned} \frac{\textrm{d}V_{\textrm{C0}}}{\textrm{d}t}&=\frac{I_{\textrm{osc}}}{2 C_{\textrm{0}}}\,. \end{aligned}$$
(2)
Fig. 2
figure 2

Waveforms in conventional relaxation oscillator [14]

Eventually, the charging node \(V_{\textrm{C0}}\) reaches the decision threshold of the comparator at \(V_{\textrm{ref}}\) and the output signal d\(_0\) resets the RS latch. As a result, \(\phi _0\) switches to 0 and \(\phi _1\) to 1, which repeats the operation likewise for \(C_{\textrm{1}}\) branch. \(C_{\textrm{0}}\) is switched to ground. The relaxation time period depends on capacitor sizes, \(I_{\textrm{osc}}\) and \(V_{\textrm{ref}}\),

$$\begin{aligned} t_{\textrm{relax}}=\frac{(C_{\textrm{0}}+C_{\textrm{1}}) V_{\textrm{ref}}}{I_{\textrm{osc}}}\,. \end{aligned}$$
(3)

Second order terms \(t_{\mathrm{2^{nd}}}\) such as comparator offset \(V_{\textrm{os}}\), comparator delay \(t_{\textrm{d,comp}}\) and logic delay \(t_{\textrm{d,logic}}\) play a larger role with increasing frequency,

$$\begin{aligned} t_{\textrm{clk,conv}}&=t_{\textrm{relax}} + \frac{(C_{\textrm{0}} +C_{\textrm{1}}) V_{\textrm{os}}}{I_{\textrm{osc}}}\nonumber \\&\quad + 2\,t_{\textrm{d,comp}} + {2\,t_{\textrm{d,logic}}} \end{aligned}$$
(4)
$$\begin{aligned}&= t_{\textrm{relax}}+t_{\mathrm{2^{nd}}}\,. \end{aligned}$$
(5)

The influence of PVT variations on \(t_{\textrm{relax}}\) can be reduced sufficiently with trimming and temperature compensation methods applied to \(I_{\textrm{osc}}\) and \(V_{\textrm{ref}}\) [15]. The second-order terms can show complex PVT dependencies and require more sophisticated compensation methods, which increase complexity, area and power consumption [10]. Fortunately, in low-frequency designs with output frequencies in range of several kilohertz, the overall clock period equals in good approximation \(t_{\textrm{relax}}\), which requires no compensation of second-order terms. Note that the circuit of Fig. 1 demands perfectly matched comparator offsets and capacitors for a duty-cycle of 50%.

2.2 Shunt currents at decision threshold

The average power consumption of a voltage-mode ROSC can be noted down as

$$\begin{aligned} P_{\textrm{osc}} = V_{\textrm{DD}}(I_{\textrm{osc}} + 2I_{\textrm{de}} + I_{\textrm{rs}} + I_{\textrm{bias}})\,. \end{aligned}$$
(6)
Fig. 3
figure 3

Shunt current in DE as \(V_{\textrm{C0}}\) approaches decision threshold

In (6), \(I_{\textrm{de}}\) and \(I_{\textrm{rs}}\) are dynamic currents associated to DEs and latch, respectively, and \(I_{\textrm{bias}}\) result from bias generator block. Figure 3 provides a closer view of the integration voltage \(V_{\textrm{C0}}\) in the moment of decision making. The operation at the \(V_{\textrm{C1}}\) branch is likewise. Capacitor \(C_{\textrm{0}}\) is charged within \(t_{\textrm{clk}}/2\) to \(V_{\textrm{ref}}\) with a visible slewing according to (1). Assuming a low clock frequency, the integration voltage slowly increases and it drives the DE in meta-stable region around decision threshold \(V_{\textrm{ref}}\) during \(t_{\textrm{meta}}\), before the output is actually switched. Exactly at the threshold, \(I_{\textrm{de}}\) is at its maximum. The dynamic comparator draws shunt current \(I_{\textrm{sh}}\), resulting in high power consumption [16]. Especially in low-frequency clock sources and timers employed in low-power SoCs, designers are forced to implement high-speed, high-gain comparators to cope with slewing. The increased static power consumption and complexity of such circuits burden the overall power and area budget. According to literature, conventional ROSCs following the principle of Fig. 1 achieve FoMs up to 15 nW/kHz and areas up to 0.11 mm\(^{2}\) [2, 11].

3 Proposed thyristor-based relaxation oscillator

Compared to the conventional architecture, the thyristor-based ROSC proposed in Fig. 4 employs (i) CMOS thyristor decision elements instead of classical dynamic comparators, (ii) only one integration capacitor, (iii) a non-overlapping clock generator producing four clock phases \(\phi _0\), \(\overline{\phi _0}\), \(\phi _1\) and \(\overline{\phi _1}\), (iv) no voltage reference, and (v) a programmable 1:1 current mirror (PCM).

Fig. 4
figure 4

Block diagram of proposed relaxation oscillator with simplified CMOS thyristor decision elements [14]

3.1 Operation principle

The operation of the proposed circuit is depicted in Fig. 5. It is in general similar to the conventional one. Simply said, the thyristor DEs \(T_{\textrm{0}}\) and \(T_{\textrm{1}}\) behave like inverters and toggle their state if the input voltage equals trigger voltage \(V_{\textrm{trig}}\). Assume the non-overlapping signals \(\phi _0=1\), \(\overline{\phi _0}=0\), \(\phi _1=0\) and \(\overline{\phi _1}=1\). Pass-gate transistor \(M_{\textrm{00}}\) connects the left terminal of \(C_{\textrm{osc}}\) and \(I_{\textrm{osc}}\) to the input of thyristor \(T_{\textrm{0}}\). The right terminal of \(C_{\textrm{osc}}\) is connected to ground. With this configuration, \(V_{\textrm{C0}}\) rises with a slew rate defined by (1). When \(T_{\textrm{0}}\)’s trigger voltage \(V_{\textrm{trig}}\) is reached, its output \(\overline{\textrm{d}_0}\) toggles and resets the RS latch. The \(V_{\textrm{C1}}\) branch is activated with \(\phi _0=\overline{\phi _1}=0\) and \(\overline{\phi _0}=\phi _1=1\). The positive feedback mechanism and the small transistor sizes inside the thyristor shorten the transition time, which reduce peak magnitude and duration of \(I_{\textrm{de,T0}}\) and \(I_{\textrm{de,T1}}\) compared to high-gain dynamic comparators. With (4), the clock period can be calculated to

$$\begin{aligned} t_{\textrm{clk,prop}}&=\frac{2 C_{\textrm{osc}} V_{\textrm{trig}}}{I_{\textrm{osc}}} + 2\,t_{\textrm{thyr}} + {2\,t_{\textrm{d,logic}}}\nonumber \\&\approx \frac{2 C_{\textrm{osc}} V_{\textrm{trig}}}{I_{\textrm{osc}}} + 2\,t_{\textrm{thyr}}\,, \end{aligned}$$
(7)
Fig. 5
figure 5

Waveforms and shunt current in CMOS thyristor-based ROSC

where thyristor delay \(t_{\textrm{thyr}}\) is a non-linear, temperature-dependent term [17].

4 Implementation details

The proposed ROSC is implemented in a 180 nm partially-depleted silicon-on-insulator (PD-SOI) CMOS technology. \(I_{\textrm{osc}}\) is chosen to be 50 nA and \(C_{\textrm{osc}}=300\) fF, resulting in oscillation frequency of 140 kHz and a simulated average core power consumption of only 90 nW.

4.1 Thyristor decision element

The proposed N-type thyristor decision element is based on the conventional CMOS thyristor delay element, which is known for its large and reliable time constants suitable for low-power timing circuits [17]. Figure 6 shows the basic block diagram. The thyristor circuit is the CMOS equivalent of a classical semiconductor PNPN thyristor behaving like a surge relay between nodes a and b. The normally-off device is turned on by a single pulse at the ctrl node. The bottom part of Fig. 6 shows the core thyristor formed by gate-drain coupled \(M_{\textrm{P}}\) and \(M_{\textrm{N}}\) and complemented by additional reset switches, output inverter, and IC integrator circuit, resulting in a thyristor delay element frequently employed in literature [18, 19].

Fig. 6
figure 6

Block diagram of one-stage thyristor delay element

Fig. 7
figure 7

Waveforms in one-stage thyristor delay element

Fig. 8
figure 8

Schematic of two-stage thyristor decision element

The operation of a thyristor delay element is shown in Fig. 7. Initially, \(\textrm{ctrl}=0\) and nodes \(V_{\textrm{C}}\) and \(V_{\textrm{thyr}}\) are reset to ground and \(V_{\textrm{DD}}\), respectively. A rising edge of \(\textrm{ctrl}\) starts the charging of capacitor C with slew rate \(\textrm{d}V_{\textrm{C}}/\textrm{d}t\), while the thyristor remains first in a stable off-state. Floating \(V_{\textrm{thyr}}\) stays around \(V_{\textrm{DD}}\), but it is slowly discharged by small \(M_{\textrm{N}}\) drain current. The thyristor trigger voltage equals approximately the NMOS threshold voltage \(V_{\textrm{THN}}\),

$$\begin{aligned} V_{\textrm{trig}}\approx V_{\textrm{THN}}\,. \end{aligned}$$
(8)

In the proposed thyristor-based ROSC, \(V_{\textrm{THN}}\) acts like the reference voltage in a conventional design. With \(V_{\textrm{C}}\approx V_{\textrm{THN}}\), a sharp transition occurs when the exponentially increasing \(I_{\textrm{N}}\) pulls down the gate of \(M_{\textrm{P}}\). In turn, the open \(M_{\textrm{P}}\) pulls up \(V_{\textrm{C}}\) in a positive feedback loop and the output signal switches to 1. The mechanism is sped up by choosing small transistor sizes, increasing aspect ratio W/L of \(M_N\), and decreasing W/L of \(M_P\). However, certain minimum length and area must be sustained for \(V_{\textrm{TH}}\) matching and noise. Note that no path between \(V_{\textrm{DD}}\) and ground exists at any moment, which means except for small leakage, no direct shunt current can occur in this configuration. The CMOS thyristor is hence tolerant to small slew rates and a popular device for generating micro- and milisecond delays efficiently [8]. The time delay between \(\textrm{ctrl}\) and out is calculated to

$$\begin{aligned} t_{\textrm{d}}=\frac{C V_{\textrm{THN}}}{I}+t_{\textrm{thyr}}\,, \end{aligned}$$
(9)

where the intrinsic delay \(t_{\textrm{thyr}}\) can be mathematically expressed as [17]

$$\begin{aligned} t_{\textrm{thyr}}=\root 3 \of {\frac{6\, n\,C^2_{\textrm{N}} C_{\textrm{P}} V_{\textrm{THP}}}{I_{\textrm{osc}}}} + t_{\textrm{reg}}\,. \end{aligned}$$
(10)

\(C_{\textrm{N}}\) and \(C_{\textrm{P}}\) are the parasitic capacitances associated to the gates of \(M_N\) and \(M_P\), respectively. n is the slope factor. The effects of the fractional and root term can be generally minimized by small transistor sizes and optimized routing. The regeneration time \(t_{\textrm{reg}}\) plays a minor role until very high clock frequencies [6]. Since (10) depends on technology-dependent parameters, it has to be determined in simulation or measurements. In our case, \(t_{\textrm{thyr}}\) is in the range of few nanoseconds.

There are striking similarities between (9) and the operation of an ROSC noted down in (4). The utilization of CMOS thyristors as DEs appears natural. The final thyristor DE used in the proposed ROSC is depicted in Fig. 8. For simplification, only one DE is shown. The integration capacitor \(C_{\textrm{osc}}\) is shared between \(T_{\textrm{0}}\) and \(T_{\textrm{1}}\). The basic DE circuit is composed of two CMOS thyristor stages formed by \(M_{\textrm{P}}\), \(M_{\textrm{N}}\) and reset switches \(M_0\), \(M_1\) implemented as PMOS and NMOS, which are compensated for charge-injection by dummy transistors with half W/L. \(M_2\) acts as a current source for the second stage thyristor and inverts \(V_{\textrm{thyr,1}}\). Since the slew rate of \(V_{\textrm{thyr,1}}\) is limited by intrinsic gain, a following inverter INV1 would still show excessive shunt current. The second thyristor is added instead of INV1 in order to reduce total ROSC power consumption [8].

During design, the proposed oscillator was compared to a conventional ROSC with inverter-trees as the simplest form of decision elements. In simulation, the shunt power consumption was lowered by magnitudes from 706 to 0.8 pW/kHz. Moreover, conventional dynamic comparators include large matching pairs, which contribute heavily to the overall area. With 22 µm x 19 µm in 180 nm technology, the area of the thyristor DE is minor.

4.2 Single relaxation capacitor and non-overlapping clock generator

As a second feature and in contrast to the conventional architecture, Fig. 4 uses simple NMOS switches and only one integration capacitor \(C_{\textrm{osc}}\) [20]. This is enabled by the fact that one of the capacitors \(C_0\) and \(C_1\) (refer to Fig. 1) remains shorted for half a period and is therefore redundant. Both plates of single integration capacitor \(C_{\textrm{osc}}\) are used for storing the integration voltages in different phases, which results in theory in an accurate 50% duty cycle, reduced branch mismatch, and circuit area. This enhancement comes with the drawback that signals \(\phi _0\) and \(\phi _1\) must be produced by a non-overlapping clock generator. It ensures that \(I_{\textrm{osc}}\) is always connected to at least one low-ohmic switch. Otherwise, \(I_{\textrm{osc}}\) pinches off and produces additional PVT-sensitive loop delay. A second disadvantage is that in reality, the non-overlapping delay \(t_{\textrm{nov}}\) distorts the duty cycle, as shown in Fig. 5. Due to the long clock period, this effect is minor in the design, but it must be considered for higher frequencies.

4.3 Temperature compensation

Another known challenge is the temperature sensitivity of the clock frequency. In (7), the NMOS and PMOS threshold voltages \(V_{\textrm{THN}}(T)\) and \(V_{\textrm{THP}}(T)\) appear in the relaxation term and in the \(t_{\textrm{thyr}}\) term, respectively:

$$\begin{aligned} t_{\textrm{clk,prop}}(T)=\frac{2 C_{\textrm{osc}} V_{\textrm{THN}}(T)}{I_{\textrm{osc}}} + 2\,t_{\textrm{thyr}}(T) \end{aligned}$$
(11)
Fig. 9
figure 9

Simulated temperature behavior of ROSC frequency (uncompensated and compensated) and compensation current \(I_{\textrm{osc}}\)

With \(t_{\textrm{clk,prop}}\gg t_{\textrm{thyr}}\) as a good approximation, the function is linearly proportional to \(V_{\textrm{THN}}(T)\). The temperature dependency of the NMOS threshold voltage is defined as

$$\begin{aligned} V_{\textrm{THN}}(T) \approx V_{\textrm{THN}}(T_0) (1+\alpha _{\textrm{N}} (T-T_0))\propto T\,, \end{aligned}$$
(12)

where \(V_{\textrm{THN}}(T_0)\) is the NMOS threshold voltage at room temperature and \(\alpha _{\textrm{N}}\) is the first order TC [20]. For long-channel devices in 180 nm technology, \(\alpha _{\textrm{N}}\) is simulated to be around -1200 ppm/K, i. e., \(V_{\textrm{THN}}\) and \(t_{\textrm{clk,prop}}\) decrease with temperature and the oscillation frequency shows a PTAT behavior. Simulation results for ROSC frequency versus temperature are shown in Fig. 9. Without compensation, a TC of about 2850 ppm/K is achieved, resulting in more than 50 kHz deviation over temperature range from − 40 °C to 85 °C. The first order compensation mechanism works as follows: A CTAT source injects the current \(I_{\textrm{ctat}}\) with matched TC \(\alpha _{\textrm{CTAT}}\approx -3450\) ppm/K via a PCM into the ROSC. Increasing temperature results in decreasing \(I_{\textrm{ctat}}\) and, in a linear approximation, compensates the circuit for the increasing \(V_{\textrm{THN}}\). The resulting frequency curve of the total ROSC shows -350 ppm/K deviation, achieving an effective TC reduction of factor 8. As explained, only the linear TC (first derivative) is compensated and the second-order term resulting from \(t_{\textrm{thyr}}(T)\) is ignored. Hence a slight overcompensation for temperatures above 50 °C is visible in Fig. 9, which was acceptable in our application.

4.4 Current source with current source recycling and modified start-up circuit

The \(I_{\textrm{osc}}\) is generated by the CTAT circuit as shown in Fig. 10. The challenge of obtaining high \(\alpha _{\textrm{CTAT}}\) of -3450 ppm/K is overcome by using bipolar junction transistors (BJTs) and resistors. In order to reduce the complexity of the circuit, the CTAT is supported by \(V_{\textrm{B}}\) from the on-chip PTAT bias reference, which helps to maintain the voltage across the resistor \(R_{\mathrm{_2}}=R_{\textrm{2,fix}}+R_{\textrm{2,trim}}\) equal to the forward voltage drop of the diode \(D_{\mathrm{_1}}\), as shown later. PTAT generators are standard components typically employed in mixed-signal SoCs to bias the analog blocks such as amplifiers, filters, analog-to-digital converters, etc. In the context of a complete SoC, high \(\alpha _{\textrm{CTAT}}\) is achieved with low additional power consumption and minor area for \(V_{\textrm{B}}\) routing. The overall chip area and power is hence not increased by the proposed PTAT recycling method rimming resistors

Fig. 10
figure 10

CTAT current source recycling PTAT generator

Fig. 11
figure 11

Modified start-up circuit (right) versus conventional (left)

Fig. 12
figure 12

Simulated starting sequence under nominal conditions, \(\textrm{en}=1\) at 0 ms

Fig. 13
figure 13

One of the two one-hot coded

Fig. 14
figure 14

Measured CTAT and PTAT currents versus trimming vector

The current source uses self-biased low voltage cascode current mirrors. The loop across \(M_{8}\), \(M_{10}\), \(M_{12}\), \(M_{13}\) forces the node voltages \(V_{\textrm{x}}=V_{\textrm{y}}=V_{\mathrm{EB_1}}\), where \(V_{\mathrm{EB_1}}\) is the emitter–base voltage of diode-connected BJT \(D_1\). The PTAT voltage and current, respectively, are expressed as [21]

$$\begin{aligned} V_{\textrm{PTAT}}= {V_{\mathrm{EB_1}}-V_{\mathrm{EB_2}}}=V_{\textrm{T}} \cdot \textrm{ln}(N) \end{aligned}$$
(13)

and

$$\begin{aligned} I_{\textrm{PTAT}}=\frac{V_{\textrm{PTAT}}}{R_{0}}=\frac{V_{\textrm{T}} \cdot \textrm{ln}(N)}{R_{0}}\,, \end{aligned}$$
(14)

where \(V_{\mathrm{EB_2}}\) is the emitter–base voltage of \(D_2\), \(V_{\textrm{T}}\) is the thermal voltage, and \(R_{\textrm{0}}\) is a series combination of fixed and programmable parts \(R_{\textrm{0,fix}}\) and \(R_{\textrm{0,trim}}\). Likewise, the CTAT current can be noted down as

$$\begin{aligned} I_{\textrm{ctat}}=\frac{V_{\textrm{ctat}}}{R_{2}} ={\frac{V_{\mathrm{EB_1}}}{R_{2}}}\,. \end{aligned}$$
(15)

Neglecting the body effect for simplicity, the voltages \(V_{\textrm{x}}\), \(V_{\textrm{y}}\) and \(V_{\textrm{z}}\) can be approximately calculated to

$$\begin{aligned} V_{\textrm{x}}&=V_{\textrm{DD}} -| V_{\textrm{OV,8}} | - | V_{\textrm{OV,9}} | \nonumber \\&\quad - V_{\textrm{OV,12}} -V_{\textrm{TH,12}}\,, \end{aligned}$$
(16)

where \(V_{\textrm{DD}}\) is the supply voltage and \(| V_{\textrm{OV,8}}|\), \(| V_{\textrm{OV,9}}|\), \(V_{\textrm{OV,12}}\) are the overdrive voltages (\(V_{\textrm{OV}}=V_{\textrm{GS}}-V_{\textrm{TH}})\) of \(M_{8}\), \(M_{9}\), \(M_{12}\), respectively;

$$\begin{aligned} V_{\textrm{y}}= {V_{\textrm{DD}}-V_{\textrm{n}}-V_{\textrm{OV,13}}}\,, \end{aligned}$$
(17)

where cascode node voltage \(V_{\textrm{n}}=| V_{\textrm{OV,8}} | + | V_{\textrm{OV,9}} | + | V_{\textrm{TH,9}}|\) and \(| V_{\textrm{TH,9}} |\), \(V_{\textrm{OV,13}}\) are the threshold and overdrive voltage of \(M_{9}\), \(M_{13}\) respectively; and

$$\begin{aligned} V_{\textrm{z}}= {V_{\textrm{DD}}-V_{\textrm{p}}-V_{\textrm{OV,16}}}\,, \end{aligned}$$
(18)

where cascode node voltage \(V_{\textrm{p}} = | V_{\textrm{OV,17}} | + | V_{\textrm{OV,18}} | +| V_{\textrm{TH,18}} |\) and \(V_{\textrm{OV,16}}\), \(| V_{\textrm{OV,17}} |\), \(| V_{\textrm{OV,18}} |\) are the overdrive voltages of \(M_{16}\), \(M_{17}\), \(M_{18}\), respectively, and \(| V_{\textrm{TH,18}} |\) is the threshold voltage of \(M_{18}\).

Assuming all the transistors have the same threshold voltage \(V_{\textrm{TH}} = | V_{\textrm{TH}} |\) and overdrive voltage \(V_{\textrm{OV}} = | V_{\textrm{OV}} |\), (16), (17) and (18) can be written as

$$\begin{aligned} V_{\textrm{x}} = V_{\textrm{y}} = V_{\textrm{z}} = {V_{\textrm{DD}} -3V_{\textrm{OV}}-V_{\textrm{TH}}}\,. \end{aligned}$$
(19)

After manufacturing, random mismatch affects \(V_{\textrm{TH}}\), and hence the \(V_{\textrm{OV}}\) differ for all transistors. However, layout techniques, proper sizing, and maintaining the same aspect ratios of the current mirrors (\(M_{8}\), \(M_{10}\), \(M_{14}\), \(M_{17}\)), (\(M_{9}\), \(M_{11}\), \(M_{15}\), \(M_{18}\)), (\(M_{12}\), \(M_{13}\), \(M_{16}\)) achieve a sufficient equalization the voltages \(V_{\textrm{x}}=V_{\textrm{y}}=V_{\textrm{z}}=V_{\mathrm{EB_1}}\).

The modified start-up circuit for the current reference is shown on the right in Fig. 11. The conventional start-up circuit on the left side is made up of \(M_{3}\), \(M_{4}\), \(M_{5}\), while \(M_{6}\) and \(M_{7}\) are power-off switches. The operation is as follows: If the current source nodes are discharged, low \(V_{\textrm{B}}\) turns off \(M_{4}\) and \(V_{\textrm{su}}\) enables \(M_{\textrm{5}}\) for shorting the PMOS current mirror node \(V_{\textrm{m}}\) to \(V_{\textrm{B}}\). The circuit starts operating as soon as

$$\begin{aligned} V_{\textrm{B}} \ge V_{\textrm{TH,12}} + V_{\mathrm{EB_1}}, \end{aligned}$$
(20)

where \(V_{\textrm{TH,12}}\) is the threshold voltage of \(M_{12}\) and \(M_{13}\). It takes a long time to charge the gates of NMOS current mirror \(M_{12}\) and \(M_{13}\), until the current source reaches stable state and the oscillator produces target frequency.

In our application, a quick ROSC start-up is desired for short sensor node response. For this purpose, the classical start-up circuit is modified by the addition of \(M_{23}\) to charge \(V_{\textrm{B}}\) faster directly to \(V_{\textrm{DD}}\). Further, \(M_5\)’s source terminal is connected to ground instead of \(V_{\textrm{B}}\). The stacked diode \(M_{24}\) helps to reduce the static start-up current \(I_{\textrm{su}}\). Finally, in the conventional circuit, \(I_{\textrm{su}}\) flows even when \(\textrm{en}=0\). In the modified version, \(M_{25}\) acts as additional switch to provide a real power-off. Figure 12 shows the simulated starting sequence of the CTAT circuit with conventional and modified start-up circuit. Under nominal conditions, with a 10 ms rise-time of \(V_{\textrm{DD}}\), and with en switching to logic high at zero time, the simulated start-up time is reduced by factor 3 from 1.63 ms to 0.45 ms.

A one-hot resistor trimming technique [22] is used across the resistors \(R_{\textrm{0,trim}}\) and \(R_{\textrm{2,trim}}\) in order to reduce the process variation of \(I_{\textrm{ptat}}\) and \(I_{\textrm{ctat}}\). The major portion of required resistance is provided by fixed parts \(R_{\textrm{0,fix}}\) and \(R_{\textrm{2,fix}}\). The variation is handled with resistor-chains \(R_{\textrm{0,trim}}\) and \(R_{\textrm{2,trim}}\) controlled by two 4 bit vectors \(\varvec{Y}_{\textrm{p}}\)[3:0] and \(\varvec{Y}_{\textrm{c}}\)[3:0]. They are converted to 16 one-hot coded digital signals each by binary-to-one-hot decoder blocks (Bin2OH), as shown in Fig. 13. The measured PTAT and CTAT currents with respect to the decimal trimming vectors are plotted in Fig. 14. The measurement results match the simulation results well.

4.5 Programmable current mirror

Process variations affect the oscillation frequency not only through \(I_{\textrm{osc}}\), but as well via \(C_{\textrm{osc}}\) and \(V_{\textrm{THN}}\). With nominal \(V_{\textrm{DD}}\) and nominal temperature, the uncalibrated output frequency shows a simulated spread from 95.0 kHz to 217.3 kHz versus process corners and a simulated standard deviation of 4.91 kHz in 1000 points Monte Carlo mismatch simulation. In conventional designs, static variations are handled by trimming during production test. For this purpose, we suggest an additional wide-range 1:1 programmable current mirror between CTAT current source and oscillator core, which can be adjusted for all process variations.

Fig. 15
figure 15

Programmable 1:1 partly-binary current mirror relaying \(I_{\textrm{ctat}}\) towards oscillator core

The schematic of the 7 bit PCM is shown in Fig. 15. It is implemented as partly-binary array of parallel and stacked current mirror transistors with weights [16, 8, 4, 2, 1, 0.5, 0.25] and switched cascode transistors controlled by vector \(\varvec{X}\). The output \(I_{\textrm{osc}}\) is assembled as

$$\begin{aligned} \frac{I_{\textrm{osc}}}{I_{\textrm{ctat}}}&=\frac{16}{32}X_{\textrm{6}} +\frac{8}{32} X_{\textrm{5}} + \ldots + \frac{0.25}{32} X_{\textrm{0}} + \frac{0.25}{32} \nonumber \\&=\sum _{k=0}^{6} \frac{2^k}{128} X_k + \frac{1}{128}\,. \end{aligned}$$
(21)

The PCM range starts at \(I_{\textrm{ctat}}/128\) for \(\varvec{X}=[0,0,\ldots ,0]\) and ends at \(I_{\textrm{ctat}}\) for \(\varvec{X}=[1,1,\ldots ,1]\), i. e., \(I_{\textrm{osc}}=I_{\textrm{ctat}}\). The corresponding trimming range of the oscillator frequency is simulated to be sufficient between 4.37 kHz and 264.2 kHz under nominal conditions, which refers to a theoretical frequency resolution of ±2.05 kHz. In simulations including process, voltage and temperature variations, the oscillator could be trimmed towards the target of 140 kHz with a maximum absolute error of 1.8 kHz in all cases. Compared to a conventional 7 bit binary-weighted array using 128 unit transistors, the presented circuit achieves the same output range with only 73 devices and saves 57 % of area. As a drawback, its linearity is more sensitive to parasitic layout resistors and matching. During trimming procedure, the monotonicity of the frequency curve versus vector \(\varvec{X}\) is a critical concern. In simulations, its differential non-linearity (DNL) [23] has been evaluated. The DNL does not exceed 1.70 kHz, i. e. it is a strictly monotonic curve.

Note, that during production test of the proposed ROSC, both current reference and PCM are trimmed in cascade. The initial trimming of \(I_{\textrm{ctat}}\) with \(\varvec{Y}\) towards 100 nA achieves a rough frequency resolution of around 16 kHz per step, which is not sufficient for our application. The additional PCM enables better frequency adjustment in the range of two kilohertz per step. However, the zoom-like trimming with \(\varvec{X}\) and \(\varvec{Y}\) is necessary, since a wider programming range of \(I_{\textrm{CTAT}}\) would alter its TC too much.

5 Experimental results

Twenty-two samples of the ROSC were fabricated in a 180 nm PD-SOI CMOS technology and measured in laboratory. The circuit is part of a low-power sensor readout SoC [24]. A more compact layout can be achieved with PD-SOI because the oscillator and other noisy blocks can be placed closer to sensitive circuits. Less substrate noise is coupled through the fully isolated wells. Figure 16 shows a die micro-photograph. The oscillator core and current reference cover an area of 0.03 mm\(^2\) and 0.08 mm\({}^2\) (0.06 mm\({}^2\) without PTAT), respectively. The CTAT’s fixed and programmable resistors \(R_{\textrm{2,fix}}\) and \(R_{\textrm{2,trim}}\) occupy about 0.026 mm\({}^2\). Combined with PCM and CTAT current mirrors, all analog components require more than 95% of the total ROSC area, as depicted in Fig. 17.

Fig. 16
figure 16

Die micro-photograph of compelete SoC with proposed ROSC

Fig. 17
figure 17

Area break-down of proposed ROSC

Fig. 18
figure 18

Frequency distribution of 22 samples (non-calibrated, \(T=27\) °C)

The stand-alone oscillator core consumes 641.6 nW, while the current reference, without and with recycled PTAT, consumes 265.8 nW and 562.8 nW, respectively. The total ROSC power consumption is 907.4 nW excluding the recycled PTAT and 1.20 µW including PTAT, respectively. In the best case, we achieve 6.5 nW/kHz with the proposed circuit oscillating at target frequency around 140 kHz.

Fig. 19
figure 19

Measured frequency versus temperature with compensation

Fig. 20
figure 20

Measured transient output waveform obtained at different temperatures

Fig. 21
figure 21

Measured frequency versus supply voltage

Fig. 22
figure 22

Measured frequency versus trimming vector

Fig. 23
figure 23

Measured jitter versus supply voltage

A histogram of the frequency distribution at nominal temperature is depicted in Fig. 18 showing a mean of 140.03 kHz. About 68% of the non-calibrated samples occur within (\(140\pm 3.38\)) kHz (one-sigma), achieving a process stability \(\sigma /\mu\) of 2.41%. For further analysis, five samples were randomly selected and characterized deeper. The frequency deviation over a temperature range of − 40 °C to 85 °C is plotted in Fig. 19 for the compensated case. The samples show in average \(-\)514.7 ppm/K, which matches the simulated value of -350 ppm/K if layout parasitics are considered. Compared to the uncompensated simulated case, the TC improvement is 5.5times. If increased circuit area and complexity are acceptable, these results can be further improved by additional trimming structures, which adjust the TC of \(I_{\textrm{ctat}}\). For completeness, Fig. 20 shows measured oscillator output waveforms of one chip sample. At different temperatures, well behaved signal integrity is achieved. The supply voltage sensitivity is \(\{-0.84,+1.79\}\)% over the range of 1.5 V to 2.5 V, i. e. 2.62%/V, as depicted in Fig. 21. Figure 22 shows the frequency versus trimming vector setting. The transfer curve is not linear, but strictly monotonic, which is sufficient for the final application. Finally, the cycle-to-cycle rising edge jitter is plotted in Fig. 23. The proposed ROSC achieves an average jitter of 6.02 ns across supply voltage levels from 1.5 V to 2.5 V.

In Table 1, the CMOS thyristor-based ROSC is compared to state-of-the-art relaxation oscillators. The proposed ROSC achieves competitive numbers for TC, supply sensitivity and process stability, and shows higher FoM at higher supply voltage of 1.8 V. The other designs operate below 1.2 V except for [20, 25, 26]. When compared to the only conventional ROSC from [11] which is exemplified in Fig. 1, a two times better FoM is achieved with an area of 0.09 mm\({}^2\), excluding the recycled PTAT, which is competitive. Circuits in [2, 4, 5] show better efficiency, nevertheless they employ non-conventional architectures. Only [25] and [26] show a smaller area, but as well a much worse FoM.

Table 1 Performance of the proposed ROSC in comparison to prior art

6 Conclusion

This paper presents a relaxation oscillator with novel architecture of low power decision elements. It is known from conventional designs that shunt current consumption in the comparators is increased when low oscillation frequencies generate slewing at the integration capacitor. The proposed circuit overcomes this issue by employing CMOS thyristor circuits with sharp transition due to internal positive feedback. The current consumption associated with slew rates is reduced up to 50% compared to conventional designs. The clock frequency’s temperature deviation is compensated with an on-chip CTAT current source by factor 5.5. Its required highly-negative temperature coefficient is generated efficiently by reusing parts of the on-chip bias current reference resulting in minimal area and power overhead. A modified start-up circuit achieves three times faster wake-up of the oscillator, which reduces the active time duration and hence power consumption of wireless systems with remote wake-up. The proposed circuit is utilized in a system-on-chip as part of an autonomous Internet-of-Things sensor node.