1 Introduction

Jitter is roughly defined as the timing shaking of the square waveforms output from phase locked loops; or, is generally defined as the timing-deviation with regard to its ideal position, which in this sense is strictly called instantaneous jitter. In addition to instantaneous jitter, there are several other types of jitter, such as period jitter, cycle-to-cycle jitter [5]. This paper considers period jitters (abbreviated as jitters in the following) , which are simply periods from all cycles. But the methodology of jitter separation is the same for other kinds of jitter.

In practice, our measurement is total jitter (TJ). Statistically, total jitter can be classified into two parts: random jitter (RJ) and deterministic jitter (DJ). Random jitter is due to inherent random noise such as thermal noise, shot noise, random modulation, etc. Deterministic jitter changes in a deterministic fashion, as contrary to random jitter. It may come from reflections, cross-talk, electromagnetic interference, systematical modulation, etc. Separating and identifying each jitter component is important in understanding the root cause of jitter and further in improving on phase locked loop design [5].

2 Brief Review of Tail-fitting Algorithm

One popular method of jitter separation used by Wavecrest Company is called Tail-Fitting Algorithm [5]. It has been written to the relevant national standard [6]. The method consists of two steps: tail identification followed by tail-fitting. That is, to formulate the histogram of the jitter series first, and then to identify the left and right tails of the histogram, and finally to find two Gaussian curves to fit these two tails respectively. After the two Gaussian curves are found, the quantities of random jitter and deterministic jitter are given by:

$$ \left\{ {\begin{array}{*{20}{c}} {RJ = \frac{{{\sigma_1} + {\sigma_M}}}{2}} \\ {DJ = {\mu_M} - {\mu_1}} \\ \end{array} } \right. $$

Here, σ 1 and σ M are the standard deviation of left and right Gaussian curves, and μ 1 and μ M are the mean of left and right Gaussian curves, respectively.

There are several serious drawbacks of the algorithm. First of all, the derivation of the above two formula was not established in [5] whatsoever. Further, fitting the histogram using just two curves is theoretically illegitimate. Generally, we need more than two curves to fit the overall histogram (refer to Appendix). Practically, it is very difficult to identify the correct portions for the left and right tails — the left and right tails were not mathematically defined whatsoever. The analysis in [5] is based on two peaks for the left and right parts of the histogram. The peaks are not clearly defined either, because histogram is not monotonous, but quite fluctuating. Even after the histogram passes through some specially designed filter, which is achieved by a sophisticated artificial intelligence algorithm in [5], there is still no guarantee that the two peaks will be prominently identifiable, particularly for small jitter samples. It is even more difficult to make this step fully automatic, i.e, to use as less user intervention or visual inspection as possible. The difficulty of accurately identifying and distinguishing two tails becomes more serious when DJ portion is very small and thus the two “tails” of the histogram approach very closely to each other.

Another drawback which evaded the attention of [5] is that histogram is dependent on bin-number used in the generation of the histogram. Using histogram as starting point in this problem unnecessarily adds another variable (more ambiguity) to original noisy jitter. Clearly, we need try as much as possible to avoid using a variable histogram as our starting point for jitter separation.

The second step of tail fitting is relatively much easier: that is, to use a Gaussian curve to fit the tail. It is just a nonlinear optimization problem. The curve fitting algorithm used in [5] is χ2 fitting. For nonlinear optimization problem, however, selection of initial parameters needs attention. If initial values are not appropriate, it may either never converge to the globally optimal point or take a long time to converge. χ2 fitting in [5] needs first estimate the initial parameters so that “the initial fitting parameters are close to the final converging values” to avoid those problems. This circuitous reasoning is itself flawed because we don’t know the final converging values. Therefore initial parameter selection is another annoying problem in tail-fitting algorithm.

3 Kurtosis—based GMM Approach

3.1 Theoretical Result

The main idea of our method differs radically from that of previous method. The revolutionary change is to turn the wheel around —-change starting point of jitter separation from histogram to original jitter waveform. Instead of fitting histogram curve directly, we change consideration to the original jitter waveform, from which the histogram is generated. Why? We assume the input jitter series are independent realizations of one random variable. If we can estimate the probability density function (pdf) of this random variable, we then equivalently fit the histogram of this random variable. Our main result is stated in the following theorem, the derivation of which is in the Appendix at the end of this Letter.

Theorem

Consider the situation that a deterministic jitter with unknown shape is immerged in random jitter, the distribution of which is Gaussian:

$$ {P_{RJ}}(x) = \frac{1}{{\sqrt {2\pi {\sigma^2}} }}.{e^{ - \frac{x^2}{{2{\sigma^2}}}}} $$
(1)

The general expression for P TJ (x) is a then a Gaussian Mixture Model (GMM):

$$ {P_{TJ}}(x) = \sum\limits_{j = 1}^M {{b_j}\frac{1}{{\sqrt {2\pi \sigma_j^2} }}\exp \left[ { - \frac{{{{\left( {x - {\mu_j}} \right)}^2}}}{{2\sigma_j^2}}} \right] = \sum\limits_{j = 1}^M {{b_j}p\left( {x\left| {j;\mu_j^k,\sigma_j^k} \right.} \right)} } $$
(2)

And DJ and RJ are calculated as follows (suppose μ 1 < μ 2 < … < μ M ):

$$ \left\{ {\begin{array}{*{20}{c}} {RJ = \frac{{{\sigma_1} + {\sigma_M}}}{2}} \\ {DJ = {\mu_M} - {\mu_1}} \\ \end{array} } \right. $$
(3)

RJ is characterized by standard deviation σ. In general setting, however, the σ values for the most left and most right components are not the same. Therefore the standard deviation (or RMS) value σ of RJ is taken as the average of these two. DJ is quantified by the peak to peak value, which is calculated by the distance between two peaks of far left and far right Gaussian components. Based on above theorem, the jitter separation problem becomes parameter estimation of the above model (2).

3.2 GMM and Maximum Likelihood Estimation (MLE)

A set of optimal parameters is then sought usually under the criterion of maximum likelihood estimation (MLE), that is, to find the model parameters, which maximize the likelihood of the input data at hand, which are assumed to be a set of samples drawn independently and observing unknown density (2). For a waveform x = [ x 1 x 2 … x N ] of length N, the GMM likelihood can be written as

$$ p\left( {x\left| \lambda \right.} \right) = \prod\limits_{i = 1}^N {p\left( {{x_i}\left| \lambda \right.} \right)} $$

where λ is the collection of all the unknown parameters in (2). That is, λ = {(b i i i ) (i = 1,2, …, M )}.

However, this expression is a nonlinear function of the parameter λ and direct maximization is not possible. Fortunately, these parameters (b i , μ i , σ i ) (i = 1,2,…,M) can be estimated iteratively using EM (Expectation Maximization) algorithm [4], given model number M.

3.3 EM Algorithm

The basic idea of the EM algorithm is, beginning with an initial model parameter set λ0, to estimate a new model parameter set λ1, such that p(x1) ≥ p(x0), that is, to monotonically increase the likelihood in each step. The new model parameter set then becomes initial model parameter set until some convergence criterion is reached. For a general description of EM algorithm, refer to [9]. For our GMM model, the algorithm simplifies to a set of iterative formula for obtaining parameters λ1 from initial parameters λ0 [4, 7]:

$$ \begin{array}{*{20}{c}} {b_j^{k + 1} = \frac{1}{N}\sum\limits_{i = 1}^N {{p^k}\left( {j\left| {x_i} \right.} \right),} } \\ {\mu_j^{k + 1} = \frac{{\sum\limits_{i = 1}^N {{p^k}\left( {j\left| {x_i} \right.} \right){x_i}} }}{{\sum\limits_{i = 1}^N {{p^k}\left( {j\left| {x_i} \right.} \right)} }}} \\ {\sigma_j^{2\left( {k + 1} \right)} = \frac{{\sum\limits_{i - 1}^N {{p^k}\left( {j\left| {x_i} \right.} \right){{\left( {{x_i} - \mu_j^{k + 1}} \right)}^2}} }}{{\sum\limits_{i = 1}^N {{p^k}\left( {j\left| {x_i} \right.} \right)} }}} \\ \end{array} $$

where \( {p^k}\left( {j\left| {x_i} \right.} \right) = \frac{{b_j^kp\left( {{x_i}\left| {j;\mu_j^k,\sigma_j^k} \right.} \right)}}{{\sum\limits_{m = 1}^M {b_m^kp\left( {{x_i}\left| {m;\mu_m^k,\sigma_m^k} \right.} \right)} }} \)

And p(x i |j;μ j k , σ j k) is the Gaussian distribution with argument x i and parameterized by mean μ j k and variance σ j k,, i.e., the individual component in (2).

There are some drawbacks of EM algorithm. The algorithm requires an initialization of the unknown parameter vector λ near the solution, it may get stuck in a local maximum point; and it assumes that the total number of mixing kernels is known in advance. The following kurtosis-based EM algorithm effectively solves all these limitations.

3.4 Kurtosis-based EM Algorithm for Automatic Model Order Determination

Determining the number of components M in a mixture is thus an important, but difficult, problem.

To deal with this problem, we borrow the method in [7] and call it kurtosis —based EM algorithm. The value of the likelihood alone does not provide much information regarding the effectiveness of the fit [7]. A new measure for Gaussian mixtures, called total kurtosis, is defined. This measure provides a criterion of how well the estimated Gaussian mixture fits the real data. The smaller the total kurtosis is, the better the fit is. A lower bound for the total kurtosis is zero.

The new algorithm for Gaussian mixture density estimation starts with a small number of kernels (M = 1, in our application), it then performs EM update in order to maximize the likelihood of the data, while at the same time monitors the value of total kurtosis. Based on the progressive change of the total kurtosis, the algorithm performs kernel splitting and increases the number of kernels in the mixture. This splitting aims at making the absolute value of the total kurtosis as small as possible [7]. In short, the algorithm iteratively updates both the number of unknown kernels and the unknown parameters of these kernels from smaller number of kernels by monitoring both the likelihood and the total kurtosis.

Another great advantage of this kurtosis-based EM algorithm is that it nicely bypasses the problem of initial parameter estimation in [5]. Since the algorithm always starts with one mixture component, the selection of mixture parameters greatly simplifies. In the following iterations, the initial parameters are automatically updated from previous results [7]. It always converges to the globally optimal point (within reasonable accuracy tolerance).

4 Results

In order to test the validity of our algorithm, we applied it to several sets of signals which were all collected by HP scope. The signals were all FM (frequency modulation) signals with carrier frequency of f c = 100KHz; modulating frequency f m = 1KHz; and the sampling rate f s = 10 MHz. What change were the frequency (equivalently period) deviation from the modulating signal and thus the period jitter of the modulated signal.

Notice, there are two criteria which are to be observed and examined in this specially designed and controlled experiment. The first one is about the deterministic jitter, which ought to approximate the maximal deviation in the period of modulating signal. The second one is about random jitter. Because random jitter is one kind of characterization of one particular system, its value should be consistent regardless of the frequency deviations from the modulating signals; by corollary and to be more precisely , its value should approximate the random jitter without any modulating signal present.

To achieve the goal, we first collected results of RJ values from the above signal (with the above f c anf f s setting) without any modulation. Because there was no modulation, the jitter was completely due to random jitter and we used single Gaussian curve to fit the histogram of the period jitter. The standard deviation of the Gaussian curve gave random jitter RJ = 4.78 ns for this ideally pure sine wave. The result is illustrated in Fig. 1.

Fig. 1
figure 1

Determination of RJ from a sine wave signal without modulation

Next, we studied FM100K_1K_05K_10M signal. This notation means that f c  = 100KHz, f m  = 1KHz, maximal frequency deviation (single side) ∆f = 0.5KHz, fs = 10 MHz. Therefore, f min = 99.5KHz, f max = 100.5KHz; periods T max  = 10050 ns, T min  = 9950 ns, ∆T = 100 ns. Relative error for DJ was REDJ = (100=94.3)/100 = 5.7%; relative error for RJ was RERJ = (5.19–4.78)/4.78 = 8.6%. The final result for GMM fitting is shown in Fig. 2.

Fig. 2
figure 2

Determination of DJ and RJ from a FM signal

Similarly, keeping all other parameters the same, we studied three other cases (changed the frequency deviation to 0.8KHz, 1KHz, and 2KHz respectively) and summarized all results in Table 1. From Table 1, we conclude this algorithm is very productive, accurate to practically acceptable tolerance (the maximal relative error is within 10%). As far as speed is concerned, it takes about an average of 5 minutes to finish the separation calculation.

Table 1 Summary of DJ and RJ values and their relative errors from 5 experiments

5 Conclusion

A better method than Tail-fitting Algorithm for deterministic jitter and random jitter separation based on Gaussian mixture model is developed in this paper. The method is productive both in theory and in practice. Theoretically, the mathematical foundation on the relationship between this model and the quantities of DJ and RJ is rigorously established. From this derivation, we further conclude that the method in this paper can be easily extended to deal with a more general problem. That is, to estimate the amplitude of an unknown deterministic signal immerged in Gaussian noise.

Practically, our algorithm displays multifold benefits compared with Tail-Fitting Algorithm in [5]. It does not use the raw histogram, which depends on bin number. Using histogram as starting point in this problem unnecessarily adds another variable (more ambiguity) to original noisy jitter. It operates directly on the original jitter series and is therefore much more tractable and stable than Tail-Fitting Algorithm in [5], but the net effect is still fitting the overall histogram by multi-Gaussian curves. It does not distinguish the vague tails, which is most cumbersome in [5]. Our algorithm nicely bypasses the problem of initial parameter estimation in [5]. It always converges to the globally optimal point (within reasonable accuracy tolerance). It uses far less jitter samples for the jitter separation calculation: the average jitter length used in our algorithm is about 5,000, while the Tail-Fitting Algorithm in [5] uses an average jitter length of 50,000. This is a big reduction and therefore our algorithm is much faster.

Our algorithm needs appropriate selection of a set of parameters, particularly one parameter controlling the convergence of the likelihood. The selection fashion is like tuning the parameters of a wide band antenna designed to receive signals with a wide range of frequencies. However, this is not difficult: the values fall within some reasonable region and can be set by experiment in the lab beforehand. The good news is that, when that parameter changes, the accuracy is still ensured, which is most important, although the number of kernels (which dictates speed) differs.

Our work was performed from fall 2001 to summer 2002. Because of some reasons, we did not consider publishing our work until recently. In order to check the timeliness of our work, we did a recent (up to 2008) literature survey. We searched extensively in some authoritative journals related to this area, such as, the Journal of Electronic Testing, Measurement, IEEE Journal of Solid-State Circuits, IEEE Transactions. on Circuits and Systems (I and II), IEEE Trans. on Instrumentation and Measurement, etc. We did not find any specific algorithms to overcome the shortcomings of Tail-fitting Algorithm. However, indeed, we found five conference papers [13, 8, 10] which proposed new jitter decomposition techniques. Unfortunately, they did not point out the drawbacks of Tail-fitting Algorithm either. So we are confident that our work is still timely and novel at least in the aspect this Letter’s title claims. Limited by the page length of this short Letter, a comprehensive comparison of our algorithm with those in [13, 8, 10] is beyond scope of this Letter and may be done in the future.

In other words, we confine our discussion in this Letter to the specific comparison of our algorithm with Tail-fitting Algorithm, nothing else.