Statistical supervised learning with engineering data: a case study of low frequency noise measured on semiconductor devices

Our practical motivation is the analysis of potential correlations between spectral noise current and threshold voltage from common on-wafer MOSFETs. The usual strategy leads to the use of standard techniques based on Normal linear regression easily accessible in all statistical software (both free or commercial). However, these statistical methods are not appropriate because the assumptions they lie on are not met. More sophisticated methods are required. A new strategy based on the most novel nonparametric techniques which are data-driven and thus free from questionable parametric assumptions is proposed. A backfitting algorithm accounting for random effects and nonparametric regression is designed and implemented. The nature of the correlation between threshold voltage and noise is examined by conducting a statistical test, which is based on a novel technique that summarizes in a color map all the relevant information of the data. The way the results are presented in the plot makes it easy for a non-expert in data analysis to understand what is underlying. The good performance of the method is proven through simulations and it is applied to a data case in a field where these modern statistical techniques are novel and result very efficient.


Introduction
In the last 50 years, the Semiconductor Industry has decreased the dimensions of transistors significantly to almost atomic dimensions with the aim of increasing the production of transistors with time. This down-scaling strategy is leading to serious consequences. On the one hand, transistors become cheaper to manufacture, are able to work at faster switching rates, and consume less power. On the other hand, due to, among other causes, production defects, variability is being induced thus producing signal fluctuations, see [1] and [2]. These phenomena are not avoidable at all and are becoming more and more important for the Semiconductor Industry. One of the most important consequences is that the scaling of MOS transistors has decreased the current signals down to the level where they are not significantly higher than the fluctuations induced by carrier trapping phenomena, [3]. Therefore, a lot of effort has been made to improve the quality and to decrease noise inside the devices. Noise is a stochastic signal and unpredictable, therefore it is necessary to use statistical methods to analyze its behavior. Any tentative to understand the random behavior of the underlying phenomenon can allow to undertake further improvements, see [4,5] and [6], among others. One interesting investigation line is to analyze the problem from a statistical perspective and in this sense the study of potential correlations between the switching threshold voltage and noise characteristics is of great importance, as considered in [7], and yet it has received little attention in the current literature.
The main concern of this paper is to explore possible correlations between spectral noise current and threshold All authors contributed equally to this work. voltage measured on common on-wafer MOSFETs (Metal Oxide Semiconductor Field Effect Transistor) and to explain its nature. The response variable is noise power level, which is dominated by the flicker noise that appears in a certain 1/f slope in the frequency domain. The goal is to build a model able of quantifying the effect of the threshold voltage on this noise signal.
Although the literature on stochastic models for explaining the random behavior of low frequency noise is not short (see [4,6,8,9]), however there are not many studies specifically focused on the relationship noise-voltage from a statistical learning point of view, that is, based on data. This is where the statistician can get into the game with the aim of adapting powerful machine learning techniques that have been far demonstrated their efficacy both theoretic and practically and bringing them to this particular application field of Physics. The study for this paper has been developed based on a sample obtained as follows. For each transistor on the wafer, a unique measurement of threshold voltage is determined. However the noise is measured at different frequency values for each transistor. As a result, we obtain for each transistor various noise measurements, one for each fixed frequency level.
As explained in [1] to characterize a MOSFET it is very useful to observe the point at which the channel is filled with enough carriers to let an "appreciable" drain current flow. The characteristic switching point or gate voltage is called the threshold voltage ( V th ). For the extraction of the threshold voltage of a transistor several methods can be used in practice. The data analyzed in this paper have been obtained by the second derivative method or trans-conductance method.
Each transistor, on a silicon wafer, has a specific threshold voltage which usually has a small variability. In the processing of MOS Device Characterization, it is one of the most important and commonly used factors to characterize semiconductor devices, [1,7]. The levels can vary from transistor to transistor. Even on the same wafer, there exist various threshold voltages for the transistors.
Noise is normally understood as a spontaneous fluctuation in current or in voltage. Different types of noise have been observed in any electronic solid state device, including semiconductors devices like MOSFETs. The flicker noise or 1/f noise appears mostly in the low frequency range, where it dominates the power density spectrum. This kind of noise is associated with the imperfection in the process of fabrication and with material defects. It is observed under bias conditions in all semiconductor devices and usually the power density spectrum has a characteristic 1decade/1decade slope in the frequency domain. The physical origins of flicker noise are not definitively proven. However there exists strong evidence that 1/f noise is caused by the number of fluctuations of the carriers and not, as claimed in the past, by the mobility fluctuations of the carriers. The 1/f noise is then related to the random trapping and detrapping events of charged traps near to the silicon oxide surface SiO 2 (see [10]).
Noise measurement of MOSFETs can be performed at fixed bias points, that is, applying fixed voltage or current signals at the gate and drain of the transistor. The measurements are registered during a certain period of time, transformed using the Fast Fourier Transformation into the frequency domain, and afterward converted into the power density spectrum (PSD), [7]. These measurements require high accuracy and high resolution to determine the differences between the observations. Depending on the device, the resolution unit can be in the nano ampere range. The experiment providing the data analyzed here has been performed in the Laboratory of Nanoelectronics in the Research Centre for Information and Communications Technologies (CITIC-UGR) at the University of Granada (Spain). Specific equipment with high requirements concerning the accuracy, resolution and general connection has been used. Afterward, original scripts in the programming environment R have been developed to implement the statistical methods proposed in this paper to process the data. In this paper we use modern statistical tools that work under very weak assumptions, mainly smoothing techniques, the backfitting algorithm and the bootstrap. Kernel smoothing is one of the most popular statistical tools for nonparametric regression. Basically the method predicts the output to be the weighted average of the inputs of all training subjects. The backfitting algorithm is widely used to approximate the additive components in multiple regression problems such as the one formulated in this paper. It has long proven very good performance in practical applications and despite its iterative nature, which makes it more difficult to derive theoretical results, consistency and asymptotic properties have been obtained under weak conditions (see for example [11]). We use this algorithm as an efficient tool to solve our regression problem. In this sense, our method can be considered a supervised learning algorithm in the usual classification of methods in Machine Learning, according to which a supervised algorithm, broadly speaking, involves building a statistical model for predicting, or estimating, an output based on one or more inputs (see [12]). The bootstrap is a widely applicable and extremely powerful statistical tool that can be used to quantify the uncertainty associated with a given estimator or statistical learning method. As a simple example, the bootstrap can be used to estimate the standard errors of the coefficients from a linear regression fit. The power of the bootstrap lies in the fact that it can be easily applied to a wide range of statistical learning methods, including some for which a measure of variability is otherwise difficult to obtain as it results in our case. This paper is organized as follows. Section 2 describes the dataset and a first approach based on classical statistics. In Sect. 3 our proposal to analyze the data is presented. Section 4 gives numerical results. A useful graphic test is shown in Sect. 5 and finally the conclusions are in Sect. 6.

First approach based on the classical linear model
In the context of integrated circuits, a DIE is a rectangular pattern (a small block of semiconducting material) on a wafer that contains circuitry to perform a specific function. The dataset analyzed in this paper has been provided by an experimental study of Wafer PH1WY107MXA4 Alias 1D − 11 on Nanoelectronics Laboratory probe-station #2 (SUSS PA300PSMA).
The sample units are DIEs which are physically located on a wafer that is cut (diced) into many pieces (cells) each containing one copy of the circuit. So the wafer is seen as a square grid where each unit occupies a cell localized in terms of its spatial coordinates. Figure 1 (left panel) presents the spatial arrangement of the units sampled in the grid (wafer). On the right panel a pictogram of the values of V th measured for each DIE. Empty cells represent missing data. It is suspected that measures associated to adjacent cells are correlated and that the correlations decreases as the distance between cells increases. Therefore spatial autocorrelation could be taken into account and therefore certain geo-statistical techniques are welcome. This aspect is out of the scope of this work and will be considered in a future research.
In this section, a first approach to the data based on classic models which rely on strong assumptions, such as normality and independence of the measurements is carried out. We will see that these assumptions are not met by the data and can lead us to wrong conclusions. In any case, we will have into account the results as a start point for further and more sophisticated analysis developed in later sections of this paper.

Some descriptive plots
A sample of N = 1068 observations is available. On the one hand, there are Noise measurements taken on-wafer at a total of n = 89 locations, each one containing one DIE. In the sequel sampled units are referred as DIEs. The Noise information is provided in the frequency domain at three different levels. Specifically, it is considered Freq= 100Hz, 1000Hz, 10000Hz. The bias conditions are fixed at four levels determined by all combinations of two values of drain and gate voltage, which are: V d =0.5v, 1v, and V g =0.5v, 1v. Therefore, we have in total 12 observations related to Noise for each item or DIE. Besides, the value of the threshold voltage V th is registered for each DIE. Unlike the variable Noise, the variable V th is an intrinsic feature of the item and, as such, there is one unique record per DIE. With respect to the Noise variable, we have, from the statistical point of view, a three-factorial design with correlated data (this aspect will be discussed later), as each subject is tested for all combinations of the levels of the three factors: Freq, V d and Vg. In other words, we have a factorial repeated measures design, [13].
Firstly we notice that the measures of Noise are strongly skewed to the right, so we recommend to consider this variable in the logarithmic scale. Figure 2 presents a bivariate trellis diagram for log Noise versus Freq and for all combinations of levels of V d and Vg. The variation in V d does not seem to affect the behavior of Noise along the observations, or it has a small influence, which leads to think that this factor does not have a significant effect on the response Noise.
We notice a remarkable decreasing trend of Noise as Freq increases. In the same way, smaller values of Noise are related to the highest level of V g . That is, we detect an inverse relationship between these two factors and the response, in other words, a decreasing relationship between Freq − Noise as well as V g − Noise seems to be detected.
According to the graph, between-DIE variation is apparent in the four combinations of factors V d and V g , being the greatest variation for the combination V d = 1v and V g = 1v .
To capture the between-subject variation in all levels of factors, we include in the following section a random variable that represents the random effect associated to the DIEs. Moreover a significant difference of variation between subjects can be appreciated when comparing panels displayed on the left side of the figure 2. The same idea is suggested by the two plots on the right panels. This fact indicates that a random effect associated with each subject modifies the behavior of the relationship Freq − Noise . Moreover, the between-subjects variation increases when the V d (or V g ) level increases from 0.5v to 1v, and also differences in the slopes are seen from subject to subject in all panels, thus confirming the existence of random effect associated to each individual in the sample.

Linear regression models to predict 1/f Noise
The simplest approach for predicting the value of Noise given a set experimental conditions is to formulate a linear model with fixed effects for each DIE. Specifically, the value of Noise i (in logarithmic scale) for the i-th sample unit, for i = 1, 2, … , 1068 , is expressed as where the intercept, 0 , is the estimated log Noise for a DIE measured at the baseline level of factors, that is Freq=100 The rest of -coefficients measure the relative change in log Noise scale when the corresponding covariate is considered at the levels indicated in the above expression. For example, the coefficient F1000 quantifies the change in the response caused by changing the conditions from Freq = 100 Hz to Freq = 1000Hz, controlling for the other factors, V d and V g . The rest of coefficients in the model can be interpreted similarly. The residual term i represents the random error and it is assumed to be an independent realization of a random Normal variable with mean 0 and standard deviation (the same for all observations). According to the above linear model, the change in log Noise caused by for example an increment of Freq from 100 to 1000 is the same for all locations in the wafer. However we have seen in the previous inspection of the data, that we have reasons to believe that there are differences between − subjects with respect to the effect that the covariates have on the variable Noise. In other words, the variation of the value of Noise when the level of Freq is changed from 100 to 1000 is not constant along the different DIEs. On the contrary, there are random effects associated with the subjects (DIEs) that can modify the relationship factor-response from one location to another in the wafer. Therefore, the corresponding effect can take a specific value for each DIE, and the same conclusion is valid for the other factors V d and V g .
To confirm these insights, we fit a linear model to each DIE-dataset. Specifically we fit a different linear model to explain the effect of each factor (Freq, V d and V g ) on the log Noise variation for each individual location on the wafer. One model is built based on each DIE-dataset. This approach implies 89 × 5 = 445 parameters to be estimated.
The joint box-plot presented in Fig. 3 represents the value of the coefficient estimated from Eq. (1). The graphic corroborates some of the evidences suggested by Fig. 2. The Intercept ( 0 ) represents the log Noise for the baseline level of covariates The remarkable variation exhibited by the values of some of the coefficients (in particular the intercept) suggests the existence of a random component associated to the subjects (DIE) that modifies the corresponding fixed effect. All the boxes (except the one for the factor V d ) lay on negative values, this means that in general, the Noise is decreasing as the corresponding factor takes higher values. It is noticeable that the coefficients corresponding to V d and V g are near 0, which may be suggesting that these factors has smaller influence on the Noise variable, in other words, changes in V d or V g will have little impact in Noise.

The repeated measures ANOVA model
From a statistical point of view, a model with 445 parameters is too complex to be useful and on the other hand, one important feature of the dataset is the underlying dependence  structure due to repeated measurements on each individual item. Then we can think of a repeated measures ANOVA model to fit the data. For the inference to be meaningful, the model must satisfy the assumption of normality of the residuals, and sphericity (i.e. the variance is constant for all differences between pairs of within-subjects measurements, see [13]). When violations of the sphericity assumption occur in designs containing repeated measures, particularly when compounded by non-normality, as we will see it is the case in this study, the classical strategy for analysis is not entirely clear. As the sphericity assumption becomes more severely violated, the traditional unadjusted within-subjects F test is known to perform quite poorly, with its Type I error rate becoming extremely inflated (see [13]).
To assess if the assumption of sphericity is met, we use the Mauchly's test of sphericity, together with the estimates of , [13]. From the results given in Table 1 we observe that all factors with more than two levels but the interaction between Freq and V d show departures from sphericity. We confirm this by looking at the estimates of in Table 2.
Turning to the assumption of normality of the residuals, we deduce the quantile-quatile plot given in Fig. 4 that the residuals are non-normal, so the hypothesis of the model are not admissible. Having the above considerations into account, a repeated measures ANOVA model with Gaussian residuals is not appropriate to fit our data.

The relationship 1/f noise vs. threshold voltage, V th
The experimental study is mainly interested in evaluating the impact that the so called threshold voltage V th has on the 1/f noise. The threshold voltage is defined as the minimum gate-to-source voltage that is needed to create a conducting path between the source and drain terminals.
As mentioned above, the threshold voltage values are extracted as explained in reference [7]. It is expected that the values fit reasonably well a Gaussian distribution (see [7] and [1]). However as we show in this section, the data do not support this usual assumption as can be deduced from the results provided by the tests of normality performed using the corresponding functions included in the software R, [14]. Table 3 shows the results obtained when checking normality of noise measurements under all bias conditions. Wrong conclusions can be drawn if we base on this hypothesis without corroborating it by means of the adequate statistic testing mechanisms. For example, a parametric least-squares fit with normally distributed residuals would lead to the conclusion of absence of relation between V th and 1/f noise at all bias conditions considered here. However, a smoothed scatterplot obtained by means of locally-weighted polynomial regression, with no parametric restrictions for the variables involved, would lead to the plots presented in Fig. 5. We have represented a separate scatterplot for each combination of bias conditions and frequency. In total we have 12 fits. In absence of relationship, no tendency should be reflected by the fitted line to each scatterplot, which is not the situation for all cases. So we can not discard the existence of a relationship between V th and Noise of some nature and then we need to explore this issue from a different perspective, where we build a model that also quantifies the effect of the different levels of the factor Freq that have been considered in the design. Our proposal is presented in Sect. 3.

The supervised learning algorithm proposed
The data at hand come from a repeated measures design in which each experimental unit (e.g. DIE) is tested in more than one experimental condition, [13]. In general, whereas observations on different units are assumed independent, the observations on the same subject are not.
In this section we first discuss the proposed model and then we explain the algorithm to fit the data. Our method does not rely on strong parametric restrictions, on the contrary our method is data-driven.

The model
We describe our proposal in a generic scenario although the main interest is the practical implementation focused on the DIE dataset described in Sect. 2.
Let us consider where • y ij are the observed responses that are related to a onedimensional numeric covariate X and a factor Z with levels Z j , j = 1, 2, … , J . For a specific subject i the value X = x i is observed, and the response is measured for all subjects at all levels of factor Z, then, y ij is the response of subject i when tested at level Z j , i = 1, 2, … , n ; j = 1, 2, … , J; • m(⋅) represents the fixed effect or population function.
It quantifies the relationship between the response and the covariate, and it is assumed not to change across the levels of Z. No specific functional form for m is considered, the only assumption is that it is a smooth function in the sense of derivability.
The restriction ∑ J j=1 b j = 0 , must be met for identifiability of the model. • i is the random effect associated to the i-th subject.
For two different subjects, i ≠ i ′ , i y i ′ are random variables with mean 0 and variance 2 . We assume this variable is specific to the subject i and it is not related to the particular combination of levels of the factors the subject is tested at. • ij is the component of random error or residual. It is associated to the subject i and also to the level Z j . They are assumed to be independent random variables identically distributed with mean 0. Moreover, for all i and We assume that m is two times continuously differentiable. Then for any fixed x in a generic domain X , m can be approximated by a linear function within a neighborhood of x by a Taylor expansion, i.e.
where we denote 0 = m(x) and 1 = m � (x) . Setting (2) can be approximated by the local linear mixed effects model  [16] to estimate additive models. It has proven good performance through simulations as well as applications to real data problems and a complete theory for the twodimensional model can be found in [17] and [18] extended to high dimensional problems. [19] proposed a backfitting algorithm to fit a random-varying coefficient model based on longitudinal data. They used similar ideas to mixedeffect model to account for time-varying effects as well as intra-subject dependence structure. The real dataset we are analyzing can be seen as a longitudinal study with the frequency-domain playing the role of time-domain of a typical longitudinal study. Our model is close to the model proposed in [19], with important discrepancies. In our case the covariate ( V th ) is not a longitudinal variable, so that the effect of the covariate on the response do not vary with frequency. On the other hand, unlike the work of [19], the covariate is introduced in the model fully non-parametrically. We sketch in the following the steps of our algorithm.
Step 3. Mixed-effects model fitting. Put r = r + 1 , and define y (r) ij = y ij −m (r−1) (x i ) , for all i = 1, … , n and j = 1, 2, 3 . Then fit the following mixed-effects model: or, in matrix notation, is a vector of fixed effects, and i is a random effect such that i ∼ (0, ) , and the residuals i ∼ (0, i ) , with i a covariance matrix of dimension J × J . Then, suitable estimations can be obtained by minimizing the following objective function (see [20]) Since and i are unknown, to solve the minimization problem we use linear mixed model software such as function lme() , see [21]. We extract the best linear unbiased predictions (BLUPs) of the random effects from the fitted model, which are denoted (r) i for i = 1, 2, … , n , as well as the estimations of the fixed effects, denoted as b (r)

Fig. 6
Flowchart of the stepby-step procedure given in Algorithm 1 Step 4. Convergence. Repeat Steps 2-3 until convergence, that is, until the difference between two consecutive estimations is small enough. More specifically, stop at the r-th iteration when A flowchart displaying the steps of Algorithm 1 detailed above is presented in Fig. 6.

Asymptotic properties of the nonparametric estimator, m
Given the specifications of our model, the solution of the locallinear smoothing problem settled in the Step 2 (smoothing) of the algorithm is the same as the solution of the following problem. Find ̂ 0 ; and, ̂ 1 as the minimizers of the following expression where, as defined above ȳ i⋅ = J −1 ∑ J j=1 y ij . Now we can appeal to the theory on nonparametric estimation given in [22] to establish the asymptotic properties of the estimator obtained. In particular, we can obtain a limit (asymptotic) expression of the mean squared error, that is, Let f(x) denote the density function of the covariate X; m �� (x) is the second derivative of the function m. Associated to the kernel function K, let us define the following: R(K) = ∫ K 2 (u)du ; and, 2 = ∫ u 2 K(u)du . And finally, denote the column vector of dimension J with all its components equal to 1. Then, when n → +∞ , the asymptotic MSE is given by where o(⋅) is function with the property o(t)∕t → 0 , when t → 0 . From this expression we can deduce the consistency of the estimator, which means that the estimator is closer to the true function as n → +∞ . The first term of the sum gives the asymptotic bias of the estimator, while the second one gives the asymptotic variance. As can be seen the choice of the bandwidth parameter is important to keep the balance between the two terms. A bandwidth h ≈ n −1∕5 will lead to asymptotic bias and variance of the same order, and then the asymptotic error of the estimator is of order o(n −4∕5 ).

The bandwidth parameter, h
One problem inherent to kernel smoothing is the appropriate choice of the bandwidth or smoothing parameter h. The literature on the subject is very extensive and various methods compete when choosing the value of the parameter that provides the resulting estimator with the optimal properties (see for example [22] for details). The selection of the bandwidth in our context will not be treated in this paper. For the examples that will be seen below, different options have been tested leading to similar results. Finally to estimate the curves presented in the Figs. 8 and 9 corresponding to the real application, as well as for estimated curves of Fig. 7 in the simulations, plug-in bandwidth selection has been considered.
To avoid potential, and not desirable, influence of the bandwidth, in Sect. 5 we use a multiscale methodology to solve the contrast problem formulated there. This methodology allows solving the problem without being conditioned by any concrete bandwidth.

Simulations
To prove the good performance of the algorithm we present next the results of a simulation study. We consider the following regression model: To imitate the conditions of the real example we are considering all along this paper we simulate data under the following specifications.  Table 5 Semi-parametric mixed-effects model for 1/f noise dataset where the standard deviation has also been calculated from the data. We have considered a sample of n = 100 subjects, each one being tested at J = 3 different experimental conditions. The experiment has been repeated a total of R = 500 times and the results are summarized in Table 4, from where it can be assessed the good performance of the method, in particular for the estimation of the parametric component of the model, as it is deduced for the low estimated values of bias (5) and standard deviation (6), which have been obtained, respectively as follows: .
To assess the accuracy of the estimation of the nonparametric component of the model, that is m 1 for model 1 and m 2 for model 2, we have considered for x point the average value of the estimated curve along the R samples, that is, we calculate where m (r) k (x) is the estimate based on the r-th sample, for x ∈ (0, 1) , and k = 1, 2 . The reported estimated errors for the two models considered have been 5.8e-4 for Model 1; and 1.6e-4 for Model 2. Figure 7 shows the accuracy of the estimate with respect to the true curves. For each graph in the figure, the black solid line represents the corresponding true curve considered in the model (left panel for Model 1 and right panel for Model 2), and the black dotted line is the corresponding averaged estimated curve.

Real dataset
In this section we apply the backfitting algorithm defined in Sect. 3.2 to the dataset consisting of noise measurements observed on a wafer as described in Sect. 2. As explained previously, the wafer is divided in n = 89 DIEs and each one has been tested four combinations of gate voltage and drain voltage, and three different levels of frequency. The variable response is the noise measured on the DIE in logarithmic scale, Y. We have considered the data for each combination of drain voltage ( V d ) and gate voltage ( V g ), so we have 4 different sub-samples of size 267 each. We have run the algorithm separately for each sub-sample and the results obtained for all cases have been compared. It is supposed that the response Y depends on one covariate X = V th , and one factor Z = Frequency . Besides, we consider a random term in the model specified by each particular subject or DIE.
The semi-parametric mixed-effects model of Eq. (2) has been fitted to the data separately for the four sub-samples that are determined by the different bias conditions: V d = V g = 0.5 v (Fig. 8), V d = 1 , V g = 0.5 v (Fig. 9), (Fig. 10), V d = 1 v, V g = 1 v (Fig. 11). The estimations of the parameters involved in the model are given in Table 5. In concrete, columns 1-3 give the estimated components of the fixed-effect parameter vector due to the different levels of the factor Z; and, column 4 gives the estimation of the standard deviation of the random-effect term induced by the different DIEs, that is .  Hz; the middle panel gives the result for Z = 1000Hz; and, the right panel gives the result for Z = 10000Hz. As can be appreciated from the estimations in Table 5, and the corresponding figures when the DIEs are working under gate voltage V g = 1 v, the measured noise values increase as well as the variability. Also the estimated nonparametric component, which controls the effect of the threshold voltage on the noise reported presents certain curvature indicating a change of trend. In particular, in Fig. 11 that displays the results for the DIEs working at a drain voltage V d = 1 v, the curve seems to describe an increasing tendency, meaning that the higher the switching threshold voltage the higher noise power level produced by the signal. It is important to remember that these graphs are made using a unique estimation of the curve (that is only one h value), because of that the result it is not determinant. In Sect. 5 below we discuss this question more in depth, using the graphical sizer map tool. This tool makes inference to detect what characteristics the function actually has. If there is a statistically significant increase or decrease.

Model evaluation based on bootstrap methods and the SiZer tool
We are interested in solving hypothesis testing in order to assess the significance of the main effects. Since our main interest is in evaluating the relationship 1/f noise versus threshold voltage, V th , then we are mainly interested in solving the following testing problem whose null hypothesis is settled as where we denote x any value of V th whose support is X . The null hypothesis of Eq. (7) asserts that voltage has no effect on noise power level against the alternative that there are regions of non null slope, thus implying a significant effect. To solve this testing problem we propose the use of the graphic tool SiZer Map, see for example [23] to have a detailed description of the SiZer methodology. SiZer stands for significant zero crossings of the derivatives and was introduced by [24] as a powerful exploratory graphical tool for density and regression functions. This tool uses kernel smoothing to estimate the structure that underlies the data and plays with the smoothing parameter as a scale parameter to visualize the underlying characteristics in the function under study. The characteristics that are not explained by the sample variability, are revealed through the construction of confidence intervals for the first derivative of the function. The main idea of SiZer is to consider the full family of smooths because the estimated curves under different smoothing scales might provide different information on the variation of the curve.
SiZer methodology relies mainly in a plot that is called the gradient SiZer map. First a family of smooth estimators of the target function indexed by the bandwidth parameter is obtained. Second, for gradient SiZer map that displays inference about the first derivative of the curve is developed as follows: for each bandwidth and each value in the support of the curve, a confidence interval for the first derivative is calculated and the sign of the interval is represented on the map using a color code. Considering n h different bandwidths and n x estimation points, each pixel in the ( n x × n h )-map is coded as red if the confidence interval at that estimation point is negative, indicating significant decrease; blue if the confidence interval is positive, indicating significant increase; purple if zero is within the confidence limits (no significant increase or decrease); and gray indicating regions where the data are too sparse to infer significance. When a change from blue (red) to red (blue) is detected in the map it means that the underlying curve presents a significant peak (valley) and then the function has a local maximum (minimum).

The SiZer Map Algorithm
In this section we propose to graphically test whether there is a significant effect of switching voltage on the noise observed in a transistor. The graphical test is based on scale and space inference and is outlined in the following (see [24] and [23] for details). For a given h bandwidth and a given value x we construct the estimator detailed in Sect. 3, then we obtain an estimation of the derivative m � (x) . To develop the SiZer methodology we need to construct confidence intervals around m � (x) , and then we need an estimation of the variability of the estimator of the slope. We propose a bootstrap procedure for estimating the variance of m � (x). Fix a value ∈ (0, 1) and let z ∕2 the quantile of order 1 − ∕2 of the Normal(0,1) distribution. For a given bandwidth h and a given value x, construct the local-linear estimate of the slope at point x, m � (x) , that is, obtain the corresponding ̂ 1,h (x) =m � h (x) , as explained in Sect. 3.2.
Then construct the confidence interval (CI) for the slope at point x with a confidence level of 1 − following the algorithm below.
Algorithm 2 Step 1. Run Algorithm 1 to obtain the locallinear estimation m h (x i ) at each observation point x i , and compute the residuals ij = y ij −m(x i ) −b j −̂ i , for i = 1, … , n ; and, j = 1, 2, 3; Step 2. Randomly draw residuals following [25], and obtain * ij (bootstrap residuals); Step 3. Use the re-sampled residuals to generate new response vectors from the predicted values of the fitted model. That is, define: y * ij = y ij + * ij , for i = 1, 2, … , n and j = 1, 2, … , J; Step 4. For each bootstrap sample {y * ij } , fit a regression model as in Sect. 3.2, and also obtain the estimation of the slope at point x, that is m � h (x) (1) ; Step 5. Repeat steps 1-4, a total of R times. As a result, obtain a sample m � h (x) (1) ,m � h (x) (2) , … ,m � h (x) (R) , for the fixed values x, and h; Step 6. Define the bootstrap standard deviation of the estimator of the slope at x as boot,h (x) calculated as the empirical standard deviation of bootstrap sample of Define a grid with a total of n x locations, that is values of x, and a grid of bandwidths h of size n h and construct the SiZer Map following the color code explained above. Figures 12-15 show the sizer map graphs corresponding to the results obtained for the data on noise 1/f, in Figs. 8-11. On the y-axis, h is represented on a logarithmic scale. Figure 12 displays a clear region of red pixels which means a decreasing trend of the noise power level with the value of voltage at least for the intermediate values. In this case, we cannot keep the assertion in H 0 given in Eq. (7) which is rejected, in consequence, we conclude that, at this bias condition, there is a positive correlation between threshold voltage and noise power level. This graph clearly corresponds with the characteristics we can notice in Figure 8. Figure 13 shows a map completely purple, which means that no regions in the support of V th have been found where the slope can be rejected to be 0, that is, no features are detected at any location (x) or scale (h). The null hypothesis in Eq. (7) cannot be rejected with the data at hand. The conclusion is that at these bias conditions, i.e. V d = 1 v, V g = 0.5 v, the threshold voltage does not have any influence in the noise registered. Again, this case is in well concordance with the plot seen in Fig. 9. However, the results displayed in Figs. 14 and 15 seems to contradict the ones observed in Figs. 10 and 11. The two SiZer maps suggest that the relationship voltage-noise is first increasing and later decreasing, since we appreciate a clear change from blue to red in both figures. This change of trend is also appreciated in Figs. 10 and 11, but there we can also see that the curve in both cases tends to increase on the right edge of the plot, suggesting that the noise power level is increasing for the highest threshold voltages. After a deeper examination of the plots in Figs. 10 and 11, we see that this increasing tendency in both curves is only supported by two data points in each case, we can justify the increase by a mere boundary effect and not by a true characteristic of the curve. This reasoning is confirmed looking at the corresponding SiZer maps in Figs. 14 and 15 where we see that a gray region is plotted on the right edge of the map, revealing that the sparse of data on that region does not allow us to get any clear conclusion about the behavior of the curve there. So, in these two cases, the only change of tendency of

Conclusion
Our motivation for this paper is the analysis of potential correlations between spectral noise current and threshold voltage measured on common on-wafer MOSFETs. A deep exploratory analysis of the data reveals that it is inappropriate to assume the assumptions that hold the classical Normal linear model nor the ANOVA version that allows accounting for dependency structures resulting from a repeated measures design. More sophisticated methods have been required to properly analyze the data and reach reliable conclusions. We have designed and run an algorithm based on modern nonparametric statistics and graphical tools that help in the interpretation and understanding of the nature of the data. In particular we have built a mixed-effect model with a nonparametric component to explain the main relationship of the model that is the effect of threshold voltage V th on spectral noise current, Noise which is considered in the frequency domain. Our method is an adaptation of the backfitting algorithm to our particular requirements. Afterwords, we have proposed graphical test developed through scale and space inference about the slope of the nonparametric component of the model. Punctual confidence intervals have been constructed around the curve and for different levels of smoothing (bandwidths). The graphical representation allows one to detect and confirm regions where the relation between the two magnitudes, V th and Noise, is increasing, decreasing or nonexistent. For the four datasets considered the results obtained reflect different behaviors of the variable Noise with respect to the variable V th depending on the particular combination of the bias condition considered. The noise data analyzed in the manuscript correspond to MOS-FET transistors at low frequencies. For these devices and in this frequency range, the main noise source is flicker noise or 1/f noise. For higher frequencies, thermal noise could also be important [1]. On the contrary, transit time noise and partition noise are usually noise sources more important for other types of devices and at higher frequencies. We expect that the contribution of transit time noise or partition noise to the data used in the present study is totally negligible [26].
Although there are important contributions in the recent literature for explaining the random behavior of low frequency noise, to our knowledge there are not many studies specifically focused on the relationship noise-voltage from a statistical learning point of view, that is, based on data [7]. This is where the statistician can be of great help in order to customize powerful machine learning techniques making them useful to solve problems in particular in this area of Electronic Engineering.