High-resolution short time interval measurement system implemented in a single FPGA chip

This study proposes a high-resolution short time interval measurement system based on the Vernier delay line (VDL) method. It should be noted that the programmable delay elements (PDEs) in the Xilinx field programmable gate arrays (FPGAs) provide a novel realization of delay lines. The delay lines can provide an accurate delay difference of 50 ps, which is process, voltage and temperature (PVT) invariant. An excellent consistency for the delay lines can be achieved by adjusting the timing and layout to minimize the measurement errors. The resolution achieved was 58 ps. Experimental results indicate a measurement standard deviation of 36 ps, a differential nonlinearity (DNL) of 36 ps and integral nonlinearity (INL) of 14 ps. The system features high accuracy, easy implementation and low cost.

Short time interval measurement systems with picosecond resolution have been reported in application specific integrated circuit (ASIC) and field programmable gate array (FPGA) devices [12][13][14][15][16][17][18][19]. However, systems with a resolution better than 100 ps are mostly implemented in ASIC chips. Ref. [12] uses 90 nm CMOS ASIC technology to obtain a resolution of 20 ps and Ref. [13] obtains a resolution of 68 ps. Ref. [14] offers a method with a 31 ps resolution using 0.13 μm CMOS process technology, while Ref. [15] achieves a 30 ps resolution by using the BiCMOS technology. Ref. [16] obtains a resolution of 312.5 ps using the 0.5 μm CMOS technology. However, the design of an ASIC device is highly complex, high cost and has a slow turnaround time. This makes them particularly suitable for *Corresponding author (email: wanghai@mail.xidian.edu.cn) highly specialized and mass market applications. With regard to systems implemented in FPGA chips, it is very difficult to reduce the resolution to within 100 ps. Ref. [17] obtains a resolution of 321.5 ps, while the resolutions of [18] and [19] are 121 ps and 150 ps. Although the resolution of FPGA-based design is worse, its flexibility, low cost and short turnaround time compared with ASIC architectures make it a good alternative to implement short time interval measurement systems. Consequently, FPGA-based design with a resolution of less than 100 ps is of great significance. In this study, a system with 58 ps resolution based on a single FPGA chip is proposed. Furthermore, the proposed short time interval measurement system is PVT invariant and has high accuracy.

Measurement principle
When using an FPGA chip to realize a short time interval measurement system based on the Vernier delay line (VDL) method, the key factors are the structure and realization of the two delay lines, which directly decide the measurement resolution and accuracy. Figure 1 shows the measurement principle of the VDL. Start and Stop signals are the inputs of the two delay lines. The line which the Start signal goes through is called the Start delay line, while the other is the Stop delay line. The delay of a unit in the Start delay line is t 1 , while that of the Stop delay line is t 2 . In Figure 1, the dashed box represents a delay cell formed by a delay unit from the Start delay line and the corresponding delay unit from the Stop delay line. According to the principles of VDL, t 1 must be larger than t 2 . The measurement resolution is then t 1 -t 2 . After each delay cell, the delayed Start signal is

Circuit realization
In designing a time interval measurement system, the two most difficult factors are minimizing the difference in the delay between the Start and Stop delay units, and the consistency of the delay line. A high resolution is only possible using a small delay difference. However, poor consistency of the delay line can also greatly reduce the resolution. Consequently, traditional time interval measurement systems implemented in FPGA chips always failed to obtain a satisfactory resolution. Even if relatively few systems can achieve a resolution of less than 100 ps, their structures are complicated. Therefore, research is being increasingly focused on implementing time interval measurement systems in ASIC chips to obtain a high resolution. This is because ASIC technology not only allows independent and reasonably precise control of the internal propagation times for the two input signals in terms of the control of resolution and consistency, but also makes it possible to implement relatively long delay loops to adjust the measurement range. However, the design and fabrication cycle in this technology is rather long, and the resulting circuit is not easily adaptable to a new application. Although FPGA devices do not offer the design flexibility of circuits, their development time is significantly shorter and the cost is much lower. Therefore, if the two difficulties can be solved, the time interval measurement system implemented in an FPGA chip can obtain a good resolution as well as excellent consistency. In this design, to overcome the difficulties, we developed a new realization of the delay line, and obtained a resolution of 58 ps. PDEs were cascaded to form the two delay lines. A PDE features a fully-controllable, 64-tap, wrap-around delay chain with a calibrated tap resolution [20]. Each tap can provide an accurate and PVT invariant delay, because the PDE is driven by an independent external high-precision reference clock provided by an oven controlled crystal oscillator (OCXO), which is not affected by the process, voltage or temperature changes of the FPGA chip. Figure 2 gives the RTL structure of the PDE control module and the PDE. In the PDE control module, the 64 taps are used to delay and quantize a clock period. A phase detector is used to compare the delayed reference clock with itself, and the phase difference is used to calibrate the control voltage of the taps through a filter. If PVT variations lead to changes in the tap delay, phase difference will be detected, and then the delay-locked loop (DLL) structure will compensate for the changes. Thus the tap delay will be strictly stable to the specified value. The control voltage is also sent to other PDEs to calibrate their tap delay. Therefore, unlike commonly used delay elements in an FPGA, such as carry chains and LUTs whose delays are sensitive to PVT variations, the PDE can provide a very accurate tap delay, PVT invariant. The tap delay value can be calculated through eq.
The frequency of the reference clock can be changed within the range of 190-210 MHz or 290-310 MHz. In the design, the frequency of the reference clock must be 310 MHz to guarantee that the tap delay of each PDE is suited to the measurement system. Then the tap delay is t tap = 1/(310 MHz×64)=50.4(ps).
The system is implemented in a Xilinx FPGA chip, in which there are two rows of PDEs and their numbers are X0Y0-X0Y239 and X2Y0-X2Y239. The arrangement of the PDEs in the FPGA chip is shown in Figure 3, where two PDEs form a group. Figure 1 gives the structure of the two delay lines. Both of the delay lines consist of 120 PDEs in sequence. The input of any PDE is connected to the output of the former PDE. The output of a PDE is connected to the input of the latter PDE. The data port and the clock port of the D flip-flop are linked to the outputs of corresponding PDEs in the two delay lines. In this scheme, the number of delay taps of each PDE in the Start delay line was set up to be one, and that of the Stop delay line was set up to be zero. Thus the delay of each unit in the Start delay line is the sum of the PDE's intrinsic delay (about 400 ps) and one tap delay (50 ps at 310 MHz reference clock) [21], while the delay of each unit in the Stop delay line is only the intrinsic delay. Because the intrinsic delay is identical on both lines, this delay can be offset. Therefore, the delay difference inside each cell is 50 ps. This means the theoretical resolution t 1 -t 2 is 50 ps.

Improvement and optimization
The performance of circuits in the FPGA has a close  relationship with the layouts of the elements and delays of the paths. Unreasonable placement and routing of the elements may lead to poor consistency of the delay line. This results in poor resolution and a large measurement error. The layout of elements in the default layout mode is shown in Figure 3(a), and it is difficult to establish the layout rules. The disordered placement of elements makes the routing confusing and introduces additional delays, of which the maximum is 4250 ps. The measured resolution under the default layout mode is about 4300 ps, while the measurement error is nearly 8 ns. Comparing it with 50 ps, the theoretical resolution, the system cannot provide a precise measurement result even after calibration. Therefore, to improve measurement accuracy a manual placement technique is introduced.
Manual placement is usually realized by regulating the constraints on the elements. Constraints are first regulated for PDEs, which results in the path delays between adjacent PDEs (A and B in Figure 1) being identical. But the path delays between the PDE and D flip-flop (C and D in Figure  1) of the two lines are still indeterminate. The maximum corresponding delay difference between the two lines is 510 ps, while the minimum is 29 ps. The system also cannot satisfy the measurement requirement. We have tried to regulate only the constraints on the flip-flops. Consequently, the delay between the PDE and D flip-flop of the two lines has a reliable rule. But the delays between the PDEs of the two delay lines are indeterminate within the range of 606-4367 ps. Thus the resolution still cannot be improved.
Therefore, to meet the difficult timing requirements, constraints should be specified for all the elements [23]. A long trial-and-error process was performed previously to find the optimal places for all the elements. The two delay line PDEs with the same cell were placed in the same group and in an interlaced order. For example, the eighth and ninth PDE of the Start delay line were specified as in position X0Y17 and position X0Y19, while the eighth and ninth PDE of the Stop delay line were placed in position X0Y16 and X0Y18. Constraints are also regulated on the D flip-flops to make all the D flip-flops into slices in the same row (row 0) in sequence. We specify the eighth D flip-flop in position SLICE_X1Y9, and specify the ninth D flip-flop in position SLICE_X1Y10. The locations of the elements before and after constraints are shown in Figure 3(a) and (b), respectively. Obviously the locations of the D flip-flops and PDEs in the same cell correspond with each other. In Figure 3, the green units are the PDEs of the Start delay line, the red units represent the PDEs of the Stop delay line, the blue units represent the D flip-flops, and the white units stand for the unused elements. Furthermore, Figure 3(b) also shows the detailed schematic diagram of the eighth delay cell, where the green unit Start_PDE8 stands for the eighth delay unit of the Start delay line, while the red unit Stop_PDE8 represents the eighth delay unit of the Stop delay line.
After regulating constraints on all the elements, the routing has excellent consistency. This means the delay between the PDEs of the two delay chains (A and B) can be offset to a large extent, and the delay difference of each cell can be stable at 50 ps. The path delays between the PDE and D flip-flop (C and D) also have a commendable consistency, and the fixed difference can be compensated as a system error. The path delay of A, B, C and D before and after manual placement is shown in Figure 4. Thus an easyimplementation, high-resolution and general purpose short time interval measurement system is realized in a single FPGA chip. The method can be used in other FPGA devices, such as some Altera chips, which are equipped with programmable delay elements and support manual placement [22].

Experiments and results
In the experiments, a standard time interval was generated using two different coaxial cables to delay the input signal. The time interval could be varied by changing the length of the two coaxial cables. The time intervals given were within the range of 0-6 ns. The useful cells contained about 104 of the total of 120 cells. Given two time intervals T 1 =N 1 ×τ and T 2 =N 2 ×τ differing by T (exactly 6 ns), the average quantiza- .
The non-uniformity of the quantization steps is usually illustrated by a plot of differential nonlinearity (DNL), while the conversion error is commonly presented by a plot of integral nonlinearity (INL). Both nonlinearity errors can easily be obtained by collecting and processing a large sample of measurements of time intervals having a uniform probability distribution within the measurement range of the tested system. Figures 5 and 6 show the differential nonlinearity and integral nonlinearity of the system. The average quantization step (LSB) is calculated from eq. (3) as 6 ns/(104-0)= 58 ps. The acquired differential nonlinearity is inside the range of -0.31-0.17 LSB. Thus the maximum quantization error equals (0.31+1)LSB/2=0.625 LSB or 36 ps. The integral nonlinearity is in the range of -3.33-0.39 LSB. The maximum integral nonlinearity error, calculated from the raw data shown in Figure 6(a), then amounts to 3.33 LSB or 193 ps [24].
Although the nonlinearity errors are a little large, they can be greatly reduced by a suitable correction [25][26][27]. The simplest approach involves subtracting a constant term, equal to one half of the sum of the maximum and minimum values of the integral nonlinearity, from the output data. In this way the maximum integral nonlinearity error may be reduced to one-half of the original value [28]. However, much more efficient corrections may be obtained by using the data shown in Figure 6(a) as a correction vector. For example, when the resolution is 0.25 LSB, we can represent the address of a small correction memory by adding two more bits to the output. Figure 6(b) shows the improvement as a result of this correction. The nonlinearity error has been lowered to 0.24 LSB or 14 ps. Figure 7(a) shows the raw data distribution of the output codes for a constant 2 ns time interval repeated 10000 times. The corrected data of the output codes is shown in Figure 7(b). The correction of the integral nonlinearity gives a satisfactory result by lowering the standard deviation from 75 ps to 36 ps at the output. The standard deviation is calculated from eq. (4), where σ is the standard deviation, X i is the measurement result of the ith time, and N is the number of measurement times.

Conclusions
A high-resolution short time interval measurement system has been proposed and tested. A novel delay line was realized using the PVT invariant PDEs. The measured standard deviation was as low as 36 ps with a resolution of 58 ps. The resulting DNL of the system was less than ±0.31 LSB, and the corrected INL was under ±0.24 LSB over the entire measurement range of the system. To obtain the above results during the circuit realization, we placed all the elements manually. The whole system can be easily implemented in a  single FPGA chip, which reduces cost and improves the resolution. Compared with traditional systems based on an FPGA, this system offers a higher resolution. Compared with systems implemented in ASIC chips, this system has a much lower cost but a comparable resolution. Therefore the single-implementation, low-cost and high-resolution method makes the proposed system highly attractive. Experimental results indicated that the proposed system can be used to measure short time intervals precisely.