1 Introduction

Modern cryptographic systems are highly dependent upon random numbers to seed their underlying encryption processes and as such are bounded by the quality of the random number generation method utilised. The primary method of determining a sources efficacy or “quality” is it’s entropy which is commonly computed as \(\text {H}_N[\text {X}] = -\sum ^N_{i=1}P(x_i)\log _2(P(x_i))\). Where \(\text {H}_N[\text {X}]\) is the discrete entropy for a sequence length N of a probability distribution P with realised samples \(x_i\) utilising base 2 logarithm to provide entropy in bits, where for an ideal source with infinite realisations the entropy \(\text {H}_\infty [\text {X}]\) should approach 1. Entropy (or min-entropy) can be numerically estimated through the means of a software test suite that implements the NIST SP800-90B [36] and NIST SP800-22 [3] criterion. It should be noted that methods of correlation can also be utilised to measure the quality of the entropy stream [29].

It is then common to post-process or condition the measured signal to increase the entropy density at the expense of output bit rate to mitigate against entropy starvation. This is achieved through source whitening as it decorrelates any periodic information and removes any mean bias from the source which in turn increases the entropy rate. Von Neumann proposed a simple technique to remove bias that is present in the data based upon a single source. However, more recently cryptography hash functions are commonly relied upon to perform whitening [36].

A physically realisable ideal source should exhibit a flat (white) power spectral density and be Gaussian distributed in the time domain as this provides maximal entropy under the conditions of \(\text {H}(\sigma ) = \ln (\sigma \sqrt{2 \pi e})\) where \(\sigma \) is the standard deviation which is equivalent to the root mean squared value as the source should exhibit zero mean. It is evident that an entirely flat frequency spectrum to an infinite bandwidth is impossible to realise, however, both a maximisation of signal-to-noise (where in this context, the signal is the uncorrelated information and any noise represents correlated information) ratio and bandwidth is desired. Whilst it is highly desirable to produce an effective random number generator by maximising the entropy rate, other factors must be considered when producing an implementation such as power consumption, size, and implementation costs. To that extent, it is highly desirable to explore the feasibility of implementing a high-quality and reliable entropy source on a modern standard complementary metal-oxide semiconductor (CMOS) process.

2 Entropy Sources for Random Number Generation

Random number generators can be classified into two categories: Pseudo Random Number Generators (PRNG’s) and True Random Number Generators (TRNG’s). Common PRNG’s involve the use of either linear or nonlinear feedback shift registers to realise recurrence relationships that are fed into either a hash or block cypher. The downside of PRNG’s is that they are governed by underlying analytic generating equations and thus if a majority of the sequence is observed it is possible to synthesise the analytic form leading to sequence prediction. It is then desirable to use a TRNG that has no underlying governing analytic equation and that may only be described stochastically.

Nyquist–Johnson (Thermal) noise is commonly utilised as an entropy source through the amplification of noise present in a resistance [5, 14, 17]. Thermal noise may also be measured through other implicit manifestations such as the measurement of oscillator phase noise [6, 41, 42] or memory meta-stability [2, 13, 35]. The predominant issue with relying upon thermal effects is the temperature dependence which may significantly effect the quality of the pre-conditioned entropy stream as temperature varies, which has been particularly shown for resistive implementations where lower temperatures result in a higher failure rate when performing statistical tests [32].

Chaotic oscillators have been used as TRNG’s [28, 44]. A chaotic oscillator may either be implemented in the digital [25] or analogue domain [40]. However, it should be noted chaotic systems are governed by an underlying set of nonlinear differential equations and thus machine learning methods may be used to predict the initial conditions and the system state, effectively compromising the true random nature [24].

Random Telegraph Noise (RTN) can be utilised as an entropy source as it is present in MOS devices due to charge trapping between the oxide and semiconductor interface layer as current flows across the channel [9, 11]. The trapping effect can be described by a Brownian motion model which exhibits a Lorentzian-shaped power spectral density (PSD) of \(\frac{1}{f^\alpha }\). Therefore, the use of RTN is undesired as it contains significantly correlated components due to its non-white distribution which would subsequently lower the entropy that is present. Additionally, it has been shown that the statistics of RTN vary across temperature, particularly the bandwidth due to a varying recombination time constant [39].

Shot noise may also be utilised as it is present due to the discrete nature of charge passing through a discontinuous junction. It is spectrally white and presents itself as Poissonian (converging to Gaussian due to Central Limit Theorem) distributed in the time domain. A discontinuous junction may be realised through forming a shift in the band-gap which may either take the form of a forward or reverse-biased PN junction, or a thin insulating barrier. The total power spectral density of a shot noise process is known to be proportionate to the bias (or DC) current that is present.

$$\begin{aligned} i_{\text{ n,RMS }} = \sqrt{2qI_{\text{ dc }}f_\text{ B }} \end{aligned}$$
(1)

where \(i_{\text{ n,RMS }}\) is the RMS shot noise current, \(I_{\text {dc}}\) is the bias current and \(f_\text {B}\) is the bandwidth and q is the elementary charge. An additional benefit of utilising a shot noise source allows for the control of bias current to overcome process, voltage and thermal variation (PVT). Dark current in photo-detectors [7, 21, 33, 38] and CMOS imaging sensors [26] are examples of realisable reversed junctions on a standard CMOS process, however, they are subject to visible light attacks.

This leads to the motivation behind utilising a shot noise process wherein the current (and thus the standard deviation of noise produced) may be explicitly controlled, as this prevents an additional attack vector from forming and places the dominant determining factor of the noise source’s robustness to attacks on the periphery and supporting measurement circuitry (i.e. the power supply rejection ratio of any current sources or low noise amplifiers).

The underlying stochastic process must be considered when selecting the appropriate physical phenomena for constructing an entropy source. A forward-biased PN junction may sufficiently satisfy the criteria of producing shot noise, however, this does not satisfy a complex statistical model (conduction versus tunnelling). Therefore a quantum tunnelling process is required as the movement of charge carriers may only be described through the means of a statistical distribution. Hence, a reverse-biased PN junction with a narrow depletion region width or thin oxide barrier is desired as the only mechanism for charge carriers to transition across the barrier is through the means of quantum tunnelling [12]. This has been discussed in literature to be proven as a viable quantum entropy source due to abiding by Fermi’s golden rule which states that any quantum process has a bounded finite probabilistic transition time between states, meaning that the uncertainty associated with a quantum process can be guaranteed to produce truly random numbers (when ignoring classical or deterministic noise of the system) [15].

This paper presents the design, measurement, and classification of an on-chip quantum entropy source utilising standard elements available on a consumer CMOS process. The contents of the paper solely focus on the entropy source itself. Chapter 3 explores attainable quantum tunnelling sources and the methodology behind designing a suitable device for harvesting entropy. Chapter 4 demonstrates small and large signal measurement of a suitable noise source with an emphasis on parasitic extraction. Chapter 5 assesses the viability of the noise source as an entropy device and reveals optimal biasing stratagem. Conclusions regarding viability are drawn in chapter 6.

3 Direct and Fowler Nordheim Tunnelling Sources

As quantum tunnelling is selected as the desired underlying mechanism to be utilised for an entropy source, existing techniques and implementations of tunnelling devices must be evaluated to assess the capabilities of creating tunnelling structures on a traditional solid-state CMOS process supported by existing foundries.

3.1 Review of Tunnelling Processes

Tunnelling diodes may be utilised as an entropy source by operating at the zero gradient peak of its I–V curve which is encountered at low bias voltages, this is where the conduction band of the N type semiconductor becomes aligned with the valence band of the P type semiconductor, thus allowing for maximum tunnelling to occur due to an incredibly thin depletion region [31]. Commercial-off-the-shelf tunnelling diodes have been successfully utilised in random number generators as an entropy source by applying a forward-bias voltage to ensure that the operating point remains at the aforementioned peak such that it may present as a high impedance source (ideal) current source which is then subsequently fed into a transimpedance element for amplification and sampling [8]. Alternatively, it may be possible to repeatedly bias the junction with a current source which results in the possibility of randomly attaining voltages that follow a Gaussian distribution. However, it is difficult to maintain device operation in this meta-stable region, therefore utilising a pulsed current source produces voltages that follow a Bernoulli distribution [4].

The primary prohibiting factor is that a degenerately doped junction is often required to form a tunnelling diode and as such is often impractical on commercial CMOS processes due to the invalidation of design rules, or degradation of junction quality as a consequence of fabrication steps (annealing). A more moderately doped junction can be utilised to form a Zener diode which may be reversed biased to encourage quantum tunnelling. However, it should be noted that care must be taken to not excessively reverse bias the junction as avalanche events are correlated by nature and would reduce the total entropy [10]. This drives the motivation for finding alternate implementations of narrow potential barriers that are equivalent to a very narrow depletion region.

Alternatively, a thin potential barrier may be formed by an insulating oxide layer which is achieved through the use of a metal-oxide semiconductor field effect transistor (MOSFET). Two dominant forms of quantum tunnelling occur when an electric field is present across the gate oxide layer. Direct tunnelling occurs when the barrier presents as rectangular and this is typically seen under low bias regimes or when the oxide barrier is very thin, hence the charge carriers tunnel without interacting with the conduction band. Fowler Nordheim (F–N) tunnelling occurs when the barrier presents as triangular and this is typically observed under higher bias regimes.

The signal-to-noise ratio (SNR) of the shot noise entropy source is proportional to the achievable bias current, hence it is possible to either increase the electric field that is present across the barrier or to reduce the barrier width. It should be noted that the “signal” in this context refers to the generated noise from quantum tunnelling and “noise” pertains to other sources of noise (i.e. Thermal, deterministic). This may also be referred to as the excess noise ratio (ENR). Since the magnitude of the power spectral density is proportional to the electric field across the gate oxide (and thus DC bias current flowing through the junction), it is highly desirable to operate the tunnelling device at a high bias and hence would exhibit a majority of F–N tunnelling [20, 30] of which may be described by Eq. 2.

$$\begin{aligned} I_{\text {FN}} = \frac{q^3}{16 \pi ^2 \hbar \phi _\text {b}} S F_{\text {ox}}^2 \text {exp}{\left[ -\frac{4}{3} \frac{1}{F_{\text {ox}}} \frac{\sqrt{2m_{\text {ox}}} \phi _\text {b}^{\frac{3}{2}}}{q \hbar }\right] } \end{aligned}$$
(2)

where \(\phi _\text {b}\) and \(F_{\text {ox}}\) are the work-function and the field strength across the barrier, S is the barrier area, \(\text {m}_{\text {ox}}\) is the mass of an electron in the oxide layer, \(\hbar \) is normalised Planck’s constant, q is the unit of elementary charge and \(I_{\text {FN}}\) is the resulting Fowler–Nordheim current. Direct tunnelling takes a nearly identical form, however, the importance is that direct tunnelling has a linearly proportional term with respect to the applied field (and thus voltage), rather than a squared relationship in Fowler–Nordheim tunnelling. It should be noted that many analytical forms of Fowler–Nordheim tunnelling presented in literature contain correction factors that are numerically estimated [18, 43].

However, it must be kept in mind that dielectric breakdown must be avoided to ensure device longevity. This is a concern where strong electric fields could potentially cause either soft or hard breakdown in the gate oxide layer, which in turn would present conductive channels that would diminish the device’s efficacy and render it usable for only a finite number of samples [19]. This is critical as it is highly desirable to increase the probability of tunnelling events occurring whilst also minimising the applied voltage to maintain device lifetime. It is not commonly possible to select an arbitrary oxide thickness on a commercial process thus leaving the bias voltage as the only free tunable parameter.

3.2 Analysis of Tunnelling as an Entropy Source

The electrical properties of the noise source must be understood to create a predictable model for designing the source (including optimal bias selection) and the surrounding sensing and amplification circuitry. Equation 2 may be utilised as a model, however, doing such would be cumbersome or may even be impossible as very often not all the parameters are available to the designer. Therefore a reduced form of Eq. 2 shown as Eq. 3 is used for analysis, where the physical constants are absorbed into two coefficients which is preferred as it simplifies the analysis.

$$\begin{aligned} I_{\text {FN}} = A S V_{\text {ox}}^2\text {exp}\left[ -\frac{B}{V_{\text {ox}}}\right] \end{aligned}$$
(3)

where \(V_{\text {ox}}\) is the applied voltage across the oxide barrier, A and B are model parameters. It is assumed that the oxide layer subject to the applied voltage is modelled as a homogeneous surface (i.e. \(F_{\text {ox}} = \frac{V_{\text {ox}}}{t_{\text {ox}}}\) where \(t_{\text {ox}}\) is absorbed into both A and B).

The equivalent resistance of the noise source must be determined to populate the small signal model pictured in 1. This allows for the estimation of bandwidth, and total available power. The equivalent device impedance also provides aid to the design of the subsequent amplification stages as it allows for appropriate input impedance selection and noise floor calculations. The small signal resistance may be found by taking the partial derivative of the tunnelling current (3) shown by equation 4.

$$\begin{aligned} r_{\text {FN}} = \left( \frac{\partial I_{\text {FN}}}{\partial V_{\text {ox}}}\right) ^{-1} = \frac{V_{\text {ox}}}{I_{\text {FN}}} \frac{1}{2 + \frac{B}{V_{\text {ox}}}} = \frac{\text {exp}[\frac{B}{V_{\text {ox}}}]}{AS(2V_{\text {ox}} + B)} \end{aligned}$$
(4)

where \(r_{\text {FN}}\) is the equivalent small signal resistance of the entropy source. The latter part reveals that the equivalent resistance is inversely proportional to the barrier interface surface area. This is intuitive as the probability of a tunnelling event occurring is proportional to the size increase.

The premise of Eq. 4 relies upon the tunnelling current being the sole dictator of the small signal resistance and thus it is possible to determine the power available in the noise source by utilising the maximum power transfer theorem [1]. This is done by computing Eq. 5 where \(i_{\text {n},\text {RMS}}\) is substituted from Eq. 1.

$$\begin{aligned} P_{\text {max}} = \frac{ i_{\text {n},\text {RMS}}^2 r_{\text {FN}}}{4} = \frac{ q f_{\text {B}} V_{\text {ox}} }{2( 2 + \frac{B}{V_{\text {ox}}} )} \end{aligned}$$
(5)

where \(P_{\text {max}}\) is the maximum power attainable from the entropy source. It can be seen that the total available power is dominated by a linear dependence on the bandwidth and applied bias voltage.

It is assumed that the total capacitance presented by the device is dominated by the gate-to-substrate overlap region and is a series combination of the oxide capacitance \(C_{\text {ox}}\) and the inversion capacitance \(C_{\text {inv}}\), however, the inversion capacitance is irrelevant at high frequencies and thus can be ignored (the device is assumed to be in strong inversion). Therefore the total capacitance presented by the noise device which may be modelled as a simple parallel plate capacitor is shown in Eq. 6 where the surface area can be substituted from Eq. 3.

$$\begin{aligned} C_{\text {T}} = \frac{\varepsilon _0 \varepsilon _r}{t_{\text {ox}}} S = \frac{\varepsilon _0 \varepsilon _r}{t_{\text {ox}}} \frac{\text {exp}[\frac{B}{V_{\text {ox}}}] I_{\text {FN}}}{AV^2_{\text {ox}}} \end{aligned}$$
(6)

where \(t_{\text {ox}}\) is the gate oxide layer thickness, \(C_{\text {T}}\) is the total equivalent capacitance of the entropy source, \(\varepsilon _0\) and \(\varepsilon _r\) are the free space and relative permittivity constants, respectively. The intrinsic bandwidth can then be found by combining the results from 4 and 6 to form a first-order transfer function and may be subsequently used to compute the equivalent noise bandwidth \(f_{\text {NB}}\) [16]. However, approximating the transfer function as an ideal first-order low-pass (Transfer function of H(s) with single pole located at \(f_{\text {p}} = \frac{1}{2 \pi r_{\text {tot}} C_{\text {T}}}\)) yields the noise bandwidth in Eq. 7 below:

$$\begin{aligned} f_{\text {NB}} = \frac{\pi }{2} \frac{1}{2\pi C_{\text {T}} r_{\text {tot}} } = \frac{1}{4 C_{\text {T}} } \left( \frac{1}{r_{\text {FN}}} + \frac{1}{r_{\text {in}}}\right) \end{aligned}$$
(7)

where \(r_{\text {tot}}\) is the total resistance appearing at the shared node between the noise device and the amplifier input impedance \(r_{\text {in}}\).

Fig. 1
figure 1

Noise device and measurement LNA small signal model

Assuming that the system is sensed by an amplifier with a given input impedance \(r_{\text {in}}\) and a bias tee consisting of \(L_{\text {d}}\) and \(C_{\text {s}}\) (which may be ignored for this analysis) is pictured in Fig. 1. The received power can be found in Eq. 8 subject to current division between the amplifier input and the small signal impedance \(r_{\text {FN}}\) whilst also assuming that the parasitic series resistance \(R_{\text {s}}\) is zero. This may be done as the parasitic series resistance does not pertain to the intrinsic source (this may be done as \(R_{\text {s}} < r_{\text {FN}}\)).

$$\begin{aligned} P_{\text {in, FN}} = i_{\text {n},\text {RMS}}^2 r_{\text {in}} =\left( i_{\text {n},\text {RMS}} \frac{ \frac{1}{ r_{\text {in}} } }{\frac{1}{ r_{\text {in}} } + \frac{1}{r_{\text {FN}}} } \right) ^2 r_{\text {in}} = \frac{q I_{\text {FN}} }{2 C_{\text {T}} (1 + \frac{ r_{\text {in}} }{ r_{\text {FN}} }) } \end{aligned}$$
(8)

It is then possible to use the above assumptions to construct a simplistic model to provide indicative performance benefits over alternative thermal methods. Combining the tunnelling noise source with a measurement system with an assumed thermally bounded noise floor dominated by \(r_{\text {in}}\) shows that a proportionate increase in excess noise ratio may be demonstrated by finding the ratio between the tunnelling and thermal power. This is important to acknowledge as it is highly desirable that the dominant origin of noise in the system is based upon tunnelling phenomena.

$$\begin{aligned} \text {ENR} = \frac{P_{\text {in, FN}}}{P_{\text {in, NJ}}} = \frac{qI_{\text {FN}}r_{\text {in}}}{2k_\text {b}T} \end{aligned}$$
(9)

where \(\text {ENR}\) is the excess noise ratio and \(P_{\text {in, NJ}}\) is the thermal (Nyquist–Johnson) noise of the measurement system and \(k_\text {b}\) is Boltzmann’s constant.

Device size and bias current are the two parameters available for manipulation when designing a gate tunnelling device as an entropy source. Inspecting Eq. 8 reveals that the total received power is size invariant if the amplification stage input resistance is sufficiently small compared to the tunnelling device’s small signal resistance (\(r_{\text {FN}} \gg r_{\text {in}}\)). This may be a fair assumption as many off-the-shelf amplification stages are designed to a 50 \(\Omega \) system impedance, similarly, this indicates that a transimpedance amplifier should be utilised as the noise source should present as an ideal current source and the amplifier as an ideal current sink (i.e. A low input impedance transimpedance amplifier). Ideally bias current should be kept constant as this dictates the produced shot noise, this leaves the size as a free parameter. A reduction of size increases \(r_{\text {FN}}\) which assists to reinforce the above relationship. However, the applied bias voltage must be increased to maintain the desired bias current and thus care must be taken to not excessively increase the bias voltage as to cause dielectric breakdown.

One may be tempted to design for the highest possible ENR which would result in an increase of \(r_{\text {in}}\) when \(I_{\text {FN}}\) is kept constant, however, upon doing such would decrease the total power attained as less current would flow into the input of the amplification stage. This becomes especially critical when other noise sources are present. After evaluating the above constraints, a typical input impedance of 50 \(\Omega \) and a bias current of 10 mA  in room temperature conditions (300 \(^\circ \) K) would give an ENR of 9.86 dB  pre-amplification.

4 Design and Measurement Procedure

A commercial 40 nm  process was selected where a range of both P-type and N-type MOSFET devices were tested. All transistors utilised the thinnest oxide of 1.8 nm  to provide the largest possible tunnelling probability whilst also minimising voltages and thus oxide stress. The length of the devices were fixed at 1 µm  and the total equivalent widths were selected as 0.58 mm, 5.8 mm, 17.4 mm and 52.2 mm. All devices were formed in a triple well to provide substrate isolation from other on-chip circuitry.

Each of the 6 dies were bonded to an 88 pin QFN package to be used in suitable test jigs. Both primary electrostatic discharge (ESD) cells integrated into the pad cells and secondary ESD devices were present. Secondary ESD consisted of grounded gate MOSFET’s and were kept small to minimise the impact of parasitic capacitance and to minimise leakage current. Care was taken to minimise routing resistance by utilising the entire metal stack-up from the pad cell to the entropy device with the addition of ample via stitching.

Measurements were performed in a consistent \({23}\,^\circ \hbox {C}\) environment unless otherwise specified. Time domain measurements were performed in a combination of a shielding box (Rohde & Schwarz CMW-Z10) and an EM-shielded chamber. The DC I–V characteristic was measured through the use of a Source Measure Unit (Keysight B2192A) in a range from 0 V  to 2.7 V  with 271 points (10 mV steps) as to not excessively damage the oxide layer. Each DC sweep was repeated 10 times to remove uncorrelated noise.

AC characteristics were measured through the use of a Rohde and Schwarz ZNB 20 GHz  VNA and a Coppermountain TR1300/1. It should be of note that the stimulus power should be kept relatively small due to the departure from the DC bias point as it results in a nonlinear change in capacitance. A stimulus power of -36 dBm  was chosen to compromise between signal to noise ratio and linearity.

Time domain measurements were performed through the use of a three-stage 1.3 GHz, 94 dB \(\Omega \) gain LNA driving a digitizer (SPECTRUM DN2.445-02) sampling at 500 MSPS at a 14 bit depth with a record length of 16 million points for a total acquisition time of 32 ms. The data was subsequently digitally down-sampled to 150 MHz. The small signal model of the amplification setup is pictured in Fig. 1. The construction of the LNA is shown in Fig. 2 and the frequency response (across bias currents to account for changes in DUT equivalent impedance) with its respective time domain representation (zero bias) is shown in Fig. 3.

Fig. 2
figure 2

Test Chip 1 LNA PCB

Fig. 3
figure 3

Test Chip 1 LNA frequency (a) and time domain response (b)

4.1 DC Characteristics

The motivation behind measuring the DC characteristics of the devices are to confirm the presence of Fowler Nordheim tunnelling at high field strengths and to determine the equivalent small-signal resistance. Verification of Fowler–Nordheim dominance is achieved by constructing a Fowler–Nordheim plot across the 6 chips and 5 devices (30 total) shown in Fig. 4.

Fig. 4
figure 4

Measured DC characteristics (dotted) versus resistance corrected characteristics (solid) (note: \(J_{\text {FN}} = \frac{I_{\text {FN}}}{S}\))

The measured data shows a well clustered response between devices and chips with the exception of a few outliers. These outliers are indicative of device yield, failure modes and their associated behaviour. Typical device failure consists of stray shunt conductive paths being formed through oxide layer defects. This can be summarised by augmenting Eq. 3 to contain a \(GV_{\text {ox}}\) term, where G is the shunt conductance and by then subsequently performing the change of axes required by the Fowler–Nordheim plot gives:

$$\begin{aligned} \text {log}\left[ \frac{ I_{\text {FN}} }{ V_{\text {ox}}^2 } \right] = \text {log} \left[ A S \text {exp}\left[ -\frac{B}{ V_{\text {ox}}}\right] + \frac{G}{V_{\text {ox}} }\right] \end{aligned}$$
(10)

Inspecting the RHS of Eq. 10 shows the exponential term dominates at high bias and the conductive term dominates at lower biases which are reflected in Fig. 4. However, the conductance is typically sufficiently small such as that it may be ignored (\(G = 0\)).

Ideally, the Fowler–Nordheim plot would display a completely linear response when only FN tunnelling is present. This is shown by distributing Eq. 10 to arrive at \(\text {log}\left[ \frac{ I_{\text {FN}} }{ V_{\text {ox}}^2 } \right] = \text {log} [ A S ] - \frac{B}{ V_{\text {ox}}}\). However, it can be seen in Fig. 4 that large bias voltages no longer follow the desired linear trend, which shows a decrease of current beyond the expected amount governed by Eq. 10. Holding the assumption that only Fowler–Nordheim tunnelling is present (as depicted by the simplistic models) dictates that the linear term in the equation can be sufficiently described through the introduction of series resistance. This is a reasonable assumption as resistive components of the interconnect, packaging, and die structure (i.e. substrate) are originally unaccounted for. This may be accounted for by augmenting the reduced tunnelling form of 3 to include a series voltage drop term dictated by \(V_{\text {ox}} = V_{\text {m}} - I_{\text {FN}}R_{\text {s}}\) leading to Eq. 11.

$$\begin{aligned} I_{\text {FN}} = \underbrace{ A S (V_{\text {m}}-I_{\text {FN}}R_{\text {s}})^2 \text {exp}\left[ {\frac{-B}{(V_{\text {m}}-I_{\text {FN}}R_{\text {s}})^2}}\right] }_{f(V_{\text {m}}-I_{\text {FN}}R_{\text {s}})} \end{aligned}$$
(11)

where \(R_{\text {s}}\) is the parasitic series resistance term and \(V_{\text {m}}\) is the measured voltage across both the interconnect parasitics and the entropy source. Estimation of the series resistance along with the other constants may then be reduced to a numerical optimisation problem. The error between the measured current and the model’s predicted current (\(\epsilon = I_{\text {FN,model}} - I_{\text {FN,meas}}\)) can be minimised and thus able to compute \(R_{\text {s}} = \frac{V_{\text {m}} - V_{\text {ox}}}{f(V_{\text {m}})}\). Partially rearranging the equation to \(f^{-1}(I_{\text {FN}}) = V_{\text {m}} - I_{\text {FN}}R_{\text {s}}\) reveals a transcendental relationship which is non-trivial to solve [23]. To that extent, It is possible to form a cost function by rearranging Eq. 11 to equal zero. The resulting equation produces the residual error (due to estimates in coefficients and constants) and thus forms a least squares optimisation problem to gain Eq. 12 as we would like the error to be equal to zero. This may be analytically achieved by finding \(\nabla \epsilon ^2 = 0\) which presents itself as a linear regression problem.

$$\begin{aligned} C_1 = \frac{1}{N}\sum ^N_{n=0} \left[ \text {ln}\left( \frac{I_n}{(V_n-I_nR_\text {s})^2}\right) - \text {ln}(A) + \frac{B}{V_n-I_nR_\text {s}} \right] ^2 \end{aligned}$$
(12)

where N is the total number of samples.

The cost function \(C_1\) simply serves to best fit the parameters A, B, and \(R_{\text {s}}\) to the numerical data. However, the quality of fit begins to diminish when the simplified equation no longer holds valid (i.e. across bias regions due to large dynamic range or from model oversimplification). Therefore another constraint that is robust to this issue may be introduced to enforce maximisation of curve linearity to assist in predicting the series resistance as it is priorly assumed that a Fowler–Nordheim plot should only exhibit a linear response. This is achieved by utilising the Pearson (linear) correlation coefficient.

$$\begin{aligned} C_2 = \frac{1}{N}\sum ^N_{n=0} \frac{\text {cov}\left( \frac{1}{V_n}, \frac{I_n}{(V_n - I_nR_\text {s})^2}\right) }{\sigma _{V_n} \sigma _{I_n}} \end{aligned}$$
(13)

where cov is the covariance function. Gradient of descent may be applied to the cost functions given by Eqs. 12 and 13 to find the minima, however, the resulting analytical forms are cumbersome to work with, therefore a Monte Carlo optimisation process is utilised to yield the extracted resistance by finding simultaneous minima for both \(C_1\) and \(C_2\) across the parameter space of A, B and \(R_{\text {s}}\). The extracted values for \(A = 1.8 * 10^{-6} {\frac{\hbox {V}}{\hbox {A}^{2}}}\) and \(B = {53}{\frac{\hbox {MV}}{\hbox {m}}}\) reveal a barrier height of \(\phi _{\text {b}} = 1.7\)  eV. Extracted resistances across all the devices are shown in Table 1.

Table 1 Extracted series resistances

The values presented above are comprised of routing parasitics on the test PCB and on-chip interconnect resistance. It is trivial to measure the PCB routing resistance of 0.422 \(\Omega \), meaning that the extracted values shown in Table 1 are dominated by on-chip routing which aligns with predicted results. Substituting the estimated series resistance values into Eq. 11 produces an improved linear relationship in Fig. 4 with respect to the un-corrected measured data.

Fig. 5
figure 5

Large signal impedance

To determine the total current received by the successive amplification chain, the small signal resistance can be determined by taking the numerical derivative of the measured data (\(r = \frac{\Delta V_{\text {m}} }{\Delta I_{\text {m}} }\)). The equivalent impedance versus bias voltage across device sizing is displayed in Fig. 5. The resistance scales inversely proportional to the device size and bias voltage which is predicted by Eq. 4, where the \(\text {exp}[\frac{B}{V_{\text {ox}}}]\) term dominates at lower bias voltages and the \(\frac{1}{AS(2V_{\text {ox}} + B)}\) dominates at higher bias voltages.

4.2 AC Characteristics

The determination of the equivalent capacitance is necessitated to perform bandwidth estimation. This can be achieved through performing high-frequency capacitance to voltage (C-V) measurements by biasing the noise device at differing voltages and measuring the impedance parameters through the use of a bias tee and a VNA. Once the measured scattering (S) parameters are then converted to impedance (Z) parameters it is possible to estimate both the capacitance and series resistance by the means of equations 14 and 15 of which may be derived from the DUT element pictured in Fig. 1.

$$\begin{aligned} \text {Im}[Z(j\omega )]= & {} \frac{- \omega r_{\text {FN}}^2 C_\text {T}}{\omega ^2 r_{\text {FN}}^2 C_\text {T}^2 + 1 } \bigg |_{\omega \rightarrow \infty } \Rightarrow C_\text {T} = \frac{-1}{\omega \text {Im}[Z(j\omega )]} \end{aligned}$$
(14)
$$\begin{aligned} \text {Re}[Z(j\omega )]= & {} \frac{\omega ^2 r_{\text {FN}}^2 R_{\text {s}} C_{\text {T}} + r_{\text {FN}} + R_{\text {s}} }{\omega ^2 r_{\text {FN}}^2 C_{\text {T}}^2 + 1 } \bigg |_{\omega \rightarrow \infty } \Rightarrow R_{\text {s}} = \frac{\text {Re}[Z(j\omega )]}{C_{\text {T}}} \end{aligned}$$
(15)

where \(\omega \) represents angular frequency. A graphical approach to computing Eq. 14 would be to take the logarithm of the imaginary impedance response, and the frequency axis as a pure capacitance should produce a line with a gradient of \(-1\) and a y-intercept that is indicative of the value. Performing line fitting for each bias current and plotting against the intercept value yields Fig. 6 for the 52 200 µ\(\hbox {m}^{2}\) device. As the device will be biased well into the inversion region (above 1 mA  gate current, or 1 V), the capacitance \(C_{\text {T}}\) would remain fairly constant and thus a capacitance of 725 pF  may be assumed.

The resistance depicted in Fig. 6 also agrees with the resistance presented in Table 1. This illustrates that series parasitic resistance extraction in the frequency domain and through the means of tunnelling techniques are equivalent.

It is important to de-embedded the test jig from measurements. Only approximate de-embedding is required as the expected measured capacitance and resistance are at least an order of magnitude above any parasitics present in the test jig. A shunt-series model was assumed and therefore fixture effects may be removed from the DUT through the use of a short and open condition in the socket and the use of Eq. 16.

$$\begin{aligned} Z_{\text {dut}} = \frac{Z_{\text {sc}}^2 - (Z_{\text {oc}} + Z_{\text {m}})Z_{\text {sc}} + Z_{\text {m}}Z_{\text {oc}}}{Z_{\text {oc}} - Z_{\text {m}}} \end{aligned}$$
(16)

where \(Z_{\text {sc}}\) are the short circuit, \(Z_{\text {oc}}\) open circuit and \(Z_{\text {m}}\) measured impedance parameters to produce the device under test equivalent impedance \(Z_{\text {sc}}\) to be used in Eqs. 14 and 15.

Fig. 6
figure 6

Capacitance versus voltage (52 200 µ\(\hbox {m}^{2}\) sized device)

5 Time Domain and Entropy Results

Time domain data may be captured for each device at differing bias currents to provide a detailed representation of the power spectral density and to estimate the entropy rate. The effects of the measurement setup (amplification chain and bias tee) must be removed before the data is analysed. It is assumed that the input current is subject to a transimpedance gain transfer function and is combined with additive noise. This is done by rearranging the following for \(S_{I}(f)\):

$$\begin{aligned} S_{V}(f) = |H_{\Omega }(f)|^2 S_{I}(f) + S_{\epsilon }(f) \end{aligned}$$
(17)

where \(S_{\epsilon }(f)\) is the power spectral density of the measurement system noise floor at zero bias, \(S_{I}(f)\) is the input current spectral density, and \(S_{V}(f)\) is the power spectral density of the measured post-amplifier time domain output voltage v(t). Therefore, wish to find the trans-impedance gain (\(H_{\Omega }\)) of the amplification chain. Equation 18 reveals this relationship.

$$\begin{aligned} H_{\Omega } = - \frac{Y_{21}}{(Y_{\text {dut}} + Y_{11})(Y_{\text {L}} + Y_{22}) - Y_{21}Y_{12}} \end{aligned}$$
(18)

where \(H_{\Omega }\) is the amplifier trans-impedance gain, \(Y_{11}, Y_{12}, Y_{21}\) and \(Y_{22}\) are the measured admittance parameters. \(Y_{\text {dut}}\) is the inverse of the DUT impedance measured through the means of Eq. 16. It is commonly assumed that the output is matched (\( Y_{\text {L}} = Y_{\text {0}}\)), where \(Y_{\text {0}}\) is the system admittance (inverse of system impedance \(Z_0\)).

A filter kernel is produced by measuring admittance parameters with a vector network analyser and computing Eq. 18. The filter kernel is then used to deconvolve the amplifier transfer function from the data. The above can be performed in the frequency domain as Wiener deconvolution and as such can be described in the equation below:

$$\begin{aligned} I(f) = V(f) \frac{1}{H_{\Omega }(f)} \frac{1}{1 + \frac{1}{ |H_{\Omega }(f)|^2 \text {SNR}(f) }} \end{aligned}$$
(19)

where the signal-to-noise ratio is set by \(\text {SNR}(f) = \frac{S_{V}(f)}{S_{\epsilon }(f)}\).

It is possible to verify that shot noise is the dominant source of noise when the device is under bias. Figure 7 shows the measured power spectral densities for differing bias currents that are subsequently normalised by a factor of \(2 q I_{\text {dc}}\). It can be seen that a significant portion of the mid-band response results in a small deviation from 0 decibels which indicates that shot noise is the dominant noise mechanism and confirms the initial hypothesis. As frequency increases the intrinsic bandwidth limit of the entropy source is reached and the normalised amplitude begins to decrease, showing that sufficient shot noise is no longer being generated.

Another key observation is that the lower bias currents present a larger-than-expected power spectral density, this is due to the thermal noise being comparable to the generated shot noise and thus the total power is greater than expected. Analytically, this is similar to dividing the total received power by \(2qI_{\text {dc}}\) to arrive at Eq. 20.

$$\begin{aligned} \frac{i_{\text {NJ}}^2 + i_{\text {FN}}^2}{2qI_{\text {dc}}} = \frac{2 k_{\text {b}} T }{qI_{\text {dc}} R_{\text {in}} } + 1 \end{aligned}$$
(20)

where T is temperature. This relationship highlights that significant temperature increases would produce greater amounts of thermal noise, and that an increase in input resistance would reduce the contribution of the thermal noise, overall improving the entropy source (this is not the case as discussed in Sect. 3). As the bias current increases the shot noise becomes increasingly dominant over the thermal noise which can be observed for bias currents above 500 µA.

It should be noted that all biasing conditions exhibit a \(\frac{1}{f^{\alpha }}\) response due to the trapping of carriers in the oxide-semiconductor interface. An alternative explanation for this phenomenon may be seen as the accumulated trapped charge changing the effective threshold voltage of the device which produces a Lorentzian power spectral density due to a strongly dominant low-frequency time constant. The \(\frac{1}{f^{\alpha }}\) terms illustrate that correlation is present in the data and thus this phenomenon serves to lower the overall entropy of the source. This becomes more evident as the device is biased beyond 1000 µA  where an additional slope is introduced at lower frequencies. An analytical form for determining a range of optimal biasing conditions can be obtained by minimisng the following cost function:

$$\begin{aligned} \frac{\partial }{\partial I_{\textrm{FN}}} \int _0^\infty \left( V_{\textrm{total}}^2 (f) - 2qI_{\textrm{FN}} f_{\textrm{B}} H_{\Omega }(f) \right) ^2 \textrm{d}f \le \epsilon _{\textrm{bias}} \end{aligned}$$
(21)

where \(V_{\textrm{total}}\) is the measured power spectral density and \(\epsilon _{\textrm{bias}}\) is an empirically chosen constant to allow for measurement tolerances. Applying the above methodology, it can be seen that the entropy source should ideally be biased between 500 µA  and 1000 µA.

Fig. 7
figure 7

2qI Normalised power spectral density across bias

Histogramming the time domain data which is the measured voltage at the output of the amplification chain across differing bias currents in Fig. 8 shows that the 16 Mpts of recorded values are well shaped into a normal distribution with zero mean. It can be seen that the standard deviation of the distributions increases proportionately with the bias current with no introduced skew or other higher-order statistical moments.

Fig. 8
figure 8

Histogram of values across bias

The entropy rate of the entropy source may be estimated through the use of the NIST 800-90B test suite with an 8-bit wide binary sequence as the SP800-90 specification is more thematically relevant due to the simultaneous sampling of bits and is also proposed to replace the older SP800-22 specification (Both accessible on GitHub [34, 37]). Data is also validated with the SP800-22 suite to assist with comparison to prior implementations. The measurements were performed with a higher bit-depth sampler to maximise dynamic range and therefore the data must be truncated in such a way as to ensure maximum dynamic range to not have inactive bits but also not truncate any bits with useful entropy. Hence, there is a possibility of selecting a subset of bits if the signal amplitude exceeds an amplitude of 8 bits in such a way as to maximise entropy. By performing right bit shifts before truncation it is possible to discard the least significant bits (LSB’s) that would be typically dominated by thermal noise at low bias currents to further utilise the most significant bits (MSB’s) which would be dominated by shot noise (Fig. 9).

Fig. 9
figure 9

Normalised PDF error (a) and its respective fourier transform (b)

Fig. 10
figure 10

SP800-90B estimated entropy versus bias current

Fig. 11
figure 11

NIST SP 800-90B contour plot of entropy versus bit

Estimating the entropy for the 8 bits as dictated by the NIST SP 800-90B test suite across bias whilst attempting increasing bit shifts yields Fig. 10. It should be noted that the data was processed at a sampling rate of 62.5 MHz  to limit the impact of out-of-band thermal noise. This shows the raw entropy for a non post-processed source ranges from 0.5 bits to 0.9 bits depending on the biasing conditions. The application of a bias current does indeed show an increase in entropy that follows an approximately linear response. The sudden increase in entropy at 4 mA  is most likely due to oxide breakdown events, hence, biasing the entropy source at such a large current (and thus voltage) is undesirable as it significantly reduces the lifespan. Therefore, the total raw entropy per bit of the noise source is 0.716 bits per bit at a bias current of 3.5 mA.

Integral nonlinearity (INL) and differential nonlinearity (DNL) may be alternative reasons as to why the LSB’s are discarded, Fig. 9a shows the effects of DNL on the histogrammed data where a repetitive, thus a deterministic pattern is imposed. This is obtained by passing Fig. 8 through an inverse transform to un-bias the distribution. Performing frequency domain analysis shown in Fig. 9b reveals Dirac comb sequences that are multiplies of up to \(\frac{1}{16}\) of the normalised code rate, meaning that significant DNL is exhibited up to the 4th LSB. This is a concern as the inverse Fourier transform of a Dirac comb is a Dirac comb, therefore appearing as periodic data that serves to reduce the entropy of the system. Conversely, care must be taken to not excessively shift as this would decrease the total entropy due to shifting inactive bits into the produced byte.

It may also be possible to construct output bit-streams through alternate methods by forming streams from each singular bit of the 14-bit analogue-to-digital converter (ADC). The motivation behind doing such allows for alleviation of potential nonlinearities discussed above by selectively neglecting the severely effected bits. The NIST SP800-90B test suite may then be executed on the bit streams where the results are shown in Fig. 11, where the contours indicate the attained entropy. It can be seen that bits 3 to 7 contain the most amount of entropy and as such should construct any derived bit-streams. Bit number 6 is chosen at 800 µA  in accordance to the contour plot results to provide the SP800-90B and SP800-22 test results in Tables 2 and 3 respectively. It should be noted that the information represented in each particular bit is reliant upon the gain of the system and therefore may be possible that the desirable region spans over a greater number of bits than indicated.

A comparison of the performance and implementation cost of the TRNG to previous works is shown in Table 4. It should be noted that several of the implementations displayed include either full system on-chip implementations, or measurement periphery when determining performance. It should be noted that whilst the table above indicates the total power consumption of the source being 1.53 W, the noise source itself consumes \({8.1}\,\hbox {m}W\). This further highlights the scalability of the device as if one wishes, they may increase the tunnelling device surface area to improve low power performance, but at a cost of bandwidth and thus entropy rate. The resulting bit-rate of 312.5 MHz  assumes 5 bits with significant entropy at a sample rate of 62.5 MHz.

Table 2 NIST SP800-90B results for the 6th bit at a bias current of 800 µA
Table 3 NIST SP800-22 results for the 6th bit at a bias current of 800 µA
Table 4 Comparison to differing implementations

6 Conclusion

In this paper we have presented the assessment and design techniques of a viable noise source candidate for generating entropy to be used in TRNG’s that have been verified against the NIST SP800-90B and SP800-22 specifications, illustrating that an on-chip MOS-based quantum tunnelling noise source may be used as an entropy generator. This is critical as the intrinsic nature of a quantum tunnelling process attributes large levels of uncertainty to any derived result and lowers any possibility of prediction. The entropy source is achieved by utilising a commercial 40 nm  process to implement structures that produce significant amounts of current through the means of Fowler Nordheim tunnelling through oxide layers. Statistical tests reveal that the data are normally distributed as required and that the dominant mechanism is a form of shot noise. Additional techniques for series parasitic estimation were also explored. The proposed entropy source demonstrates potential translational use in applications such as encryption, cryptography, and white noise seeding.