1 Introduction

The requirements for higher-data-rate serial-link design will continue to grow with the increasing demand for multi-GHz communications data links. The CMOS technology scaling trend of smaller supply voltage makes it more difficult to drive higher frequency signals off-chip. At high data rate, crosstalk, jitter, data skew and inter-symbol interference (ISI) become more challenging in the design of serial-link transceivers. Clock jitter is dominated by power supply and substrate noise and as a result, the clock jitter does not scale with CMOS technology. Data skew and clock jitter affect signal integrity and adversely impact the signal-rate vs transmission-distance trade-off. In serial links in which the clock is not sent with the transmitted data and is instead generated in the receiver side, the asynchronous clock between the transmitter side and the receiver side results in a frequency offset between the transmitter clock frequency and the receiver clock frequency. This offset generates errors in the recovery process due to the mismatch between devices. In serial links, in order to generate an accurate high-frequency reference clocking signal, a PLL (Phase-locked-loop) circuit is used. With an increase of the data rate, the PLL circuit has to run at higher frequencies while achieving high accuracy [1]. In addition, the transmission channel is one of the limitations to increasing the link data rate due to the channel attenuation and reflections. Pre-emphases circuits at the transmitter side such as presented in [24] and equalization circuits [5, 6] at the receiver side are used to compensate the channel attenuation and reflection to reduce the ISI effect.

In [7], the authors achieved a high data rate by increasing the number of transmitted bits by combining two modulation schemes, PAM and PWM. In this architecture, 4 bits at 250 Mb/s each can be transmitted. Two bits modulate the signal amplitude and 2 bits modulate the pulse width. The receiver side includes a PAM demodulator and a PWM demodulator. The PAM demodulator is a flash analog-to-digital converter (ADC) and the PWM demodulator uses a PLL circuit with a 7-stage ring voltage-controlled oscillator (VCO).

In [8, 9], the authors introduced low-power serial links using a PWM scheme. In [10, 11], the authors introduced a low power serial link using a PPM scheme as a time-based link. In this paper, a new time-based high-speed link architecture is proposed. The proposed architecture overcomes some of these issues by modulating both the rising and the falling edges of the input clock signal, with a clock pulse embedded in the transmitted signal. The input clock frequency requirement is as low as the input bit rate and independent of the number of transmitted bits. The proposed link uses simple circuitry, resulting in reduced power consumption and circuit complexity.

2 DTS-Based Architecture

2.1 DTS Architecture

A time-based PPM-TDC architecture has been proposed by the authors in [10, 11] and is shown in Fig. 1(a). The architecture uses only one modulation approach and decouples the clock from the data as they are transmitted over two separate channels. In order to increase the data rate and significantly reduce the clock jitter impact on the data link, two modifications to the architecture are shown in Fig. 1(b). The new architecture modulates both the rising and the falling edges of the input clock signal to generate a data signal. A reference clock pulse signal is generated and combined with the data signal for transmission over a single channel.

Figure 1
figure 1

The block diagram of the PPM-TDC link and the DTS link.

DTS signaling involves the generation of individual PPM signals with both edges modulated, along with a reference clock pulse, and combining these signals to obtain the final transmitted signal. Figure 2 shows the steps of generating the transmitted signal from the input clock signal. Figure 2(a) shows the input clock to the transmitter circuit with a time period T and approximate 50% duty cycle, and Fig. 2(b) represents the data signal, which is generated from the main clock signal. Figure 2(c) is the derived clock pulse signal with a duty cycle of less than 50%, which can be generated from the main clock. Figure 2(d) shows the data signal combined with the reference clock pulse.

Figure 2
figure 2

Transmitted signal generation.

2.2 Delay Encoding Algorithm

Delay encoding is the process of assigning edge delay values with different weights to each of the bits in the data sequence. Each data pattern is partitioned into two data patterns. One is assigned to the rising edge of the clock and the other to the falling edge of the clock. The rising edge and the falling edge are modulated independently as shown in Fig. 2(b). The rising edge is modulated over the time T PPM1 , and the falling edge is modulated over the time T PPM2 with a time resolution or time step of τ, where τ is defined as the shift in time that happens to the positive or the negative edge of the data pulse when the LSB changes between logic 0 and 1. Logically τ corresponds to the Least Significant Bit (LSB) in a bit sequence. For a given binary data sequence, (B N-1 , B N-2 , ………, B 2 , B 1 , B 0 ), where B 0 is the LSB and B N-1 is the MSB, delay values are encoded as follows:

For bit Bi, Edge Delay assignment = (2i) * (τ) * (Bi), where i = 0,1,2, ......... (N-1). If any B value is zero, no delay is assigned.

Table 1 shows the total amount of delay assigned to either the rising or falling edge of the input clock signal for N bits input data (B N-1 , B N-2 , ………, B 2 , B 1 , B 0 ) with all possible binary combinations. This hierarchical mapping of delay assignments guarantees a one-to-one mapping of data codes to delay values. In another words, any data code combination always results in a unique delay assignment. The delay resolution time between two consecutive codes is τ. Accordingly, the DTS system is designed around guaranteeing the recovery and the decoding of DTS signals with the highest possible delay resolution, τ.

Table 1 The delay assignment for n bits input data (B N-1 , B N-2 , ………, B 2 , B 1 , B 0 ).

2.3 The Transmitted Signal

Let N1 and N2 be the number of bits that can be transmitted in each modulation scheme, respectively. For a given time budget, as τ gets smaller more bits can be encoded into the modulated data symbol. Figure 2(d) shows the single-ended eye diagram of the final transmitted signal after merging the data signal with the reference clock-pulse signal. The eye diagram of a DTS symbol is quite different from the conventional eye diagram because different data codes modulate signal edge positions differently. Hence, the eye diagram shows the reference clock pulse at the beginning followed by a series of rising and falling edge transitions, representing PPM time modulation for both edges. Another observation is that the entire eye diagram symbol time is equal to the clock period. As a result, with every symbol carrying (N1 + N2) bits, the data rate of the link is (N1 + N2) * F clk bit/s, where F clk is the input clock frequency. The selection of the system time resolution τ, the clock frequency and the allowed time budget for modulation determine the overall system performance. Figure 3 shows the timing diagram of the transmitted signal represented by the solid line, where T is the clock period. The dotted lines represent the other possible timing positions for either the positive or the negative edges of the data pulse. As shown in the figure, the first data pulse in the first cycle carries the code “0001” and the second data pulse in the second cycle carries the code “1111”. The two bits on the right of each code modulate the positive edge of the data pulse while the two bits on the left of each code modulate the negative edge of the data pulse.

Figure 3
figure 3

The transmitted signal timing diagram.

3 Time Budget

In this section the design steps required to calculate the time budget of the time width T PPM1 , the time budget of the time width T PPM2 , system resolution τ and the number of bits N1 and N2 shown in Fig. 2 are explained. Using an input clock signal of period T with an approximately 50% duty cycle, a synchronized clock signal that has the same period with a pulse width T p is generated as shown in Fig. 2(c). This signal represents the reference clock pulse that will be embedded into the transmitted signal. The data pulses are generated and then combined with the reference clock pulses to realize the transmitted signal. A fixed guard time T min as shown in Fig. 2(d) is required. The bigger the guard time, the less chance for inter-symbol interference (ISI) to occur between pulses. The minimum pulse width of the data signal is T D as shown in Fig. 2(b). The values of T min , T P and T D are chosen by the designer. T D and T P should be large enough so that the corresponding pulses can propagate and be recovered through the circuits in a given fabrication process. The total time budget of both PPM1 and PPM2 widths can be calculated as \( {T_{{PPM1}}} + { }{T_{{PPM2}}} = T{ } - { }\left( {{T_P} + {2} * {T_{{min}}} + { }{T_D}} \right) \) . The number of transmitted bits is N = N1 + N2. N1 can be calculated from \( {{2}^{{N1}}}{ = }\frac{{{T_{{PPM1}}}}}{\tau } \) and N2 from \( {{2}^{{N2}}}{ = }\frac{{{T_{{PPM2}}}}}{\tau } \) where τ is the link time resolution. Either N or τ is chosen by the designer. The smaller the value of τ the more bits that can be transmitted. The value of τ should be big enough for the receiver to differentiate between two edges that represent two consecutive codes. T PPM1 and T PPM2 can be chosen to be equal. In such a case the total number of transmitted bits is N = 2*N1 = 2*N2.

An example will now be used to explain how values can be chosen. Assume that N = 8 bits at 500 Mb/s each need to be transmitted. The required input clock signal frequency is 500 MHz. Then T = 2 ns. With N1 = N2, let T B , T min and T D be chosen to be equal to 180 ps. Then T PPM1 = T PPM2  = 640 ps and τ = 40 ps. From the above example, it is concluded that a 4Gb/s DTS link (8 bits * 500 Mb/s) can be transmitted using 500 MHz as a clock frequency.

Modulating both the rising and the falling edges of the input clock signal allows more bits to be transmitted than can be transmitted by modulating one edge over the total time budget as in the PPM-TDC link [10]. To illustrate, assume that for a time budget T b , in order to use two modulation schemes, the number of transmitted bits N1 and N2 can be calculated as follows:

$$ N1 = N2 = \frac{{ln(\frac{{{T_b}/2}}{\tau })}}{{ln(2)}} $$

N is the total number of the transmitted bits and equals to N1 + N2 = 2*N1 = 2*N2 assuming that T PPM1 = T PPM2 = T b /2 given by:

$$ N = 2*\frac{{ln(\frac{{{T_b}/2}}{\tau })}}{{ln(2)}} $$
(1)

Note that by using the time budget T b for only one modulation scheme, the total number of transmitted bits is found to be [10]:

$$ \matrix{{*{20}{c}} {N = \frac{{ln\left( {{T_b}/\tau } \right)}}{{ln(2)}} = \frac{{ln\left[ {2*\frac{{\left( {{T_b}/2} \right)}}{\tau }} \right]}}{{ln(2)}}} \\ { = \frac{{ln(2) + ln\left( {\frac{{{T_b}/2}}{\tau }} \right)}}{{ln(2)}}} \\ {N = 1 + \frac{{ln\left( {\frac{{{T_b}/2}}{\tau }} \right)}}{{ln(2)}}} \\ } $$
(2)

Comparing Eqs. 1 and 2, indicates that modulating both edges allows more bits to be transmitted for all values of \( \frac{{ln\left( {\frac{{{T_b}/2}}{\tau }} \right)}}{{ln(2)}} > {1} \) which is the case for all link designs unless 2 bits need to be transmitted.

In the previous example, the total time budget is \( {T_{{Total}}} = {T_{{PPM1}}} + {T_{{PPM2}}} = {128}0{\text{ ps}} \) and by modulating one edge, using the same link resolution value, the number of transmitted bits can be calculated from Eq. 2 to be N = 5. While by modulation both of the clock edges, 8 bits can be transmitted in the same time budget.

In order to transmit 8 bits using the PPM-TDC architecture proposed in [10] using the same input clock frequency, the required link resolution τ would be \( \tau = \frac{{T/2}}{{{2^N}}} = { 3}.{9} {\text{ps}} \) [10]. The PPM-TDC link resolution is very small compared to the 40 ps link resolution in the DTS architecture.

4 The Transmitter Circuit

4.1 The Data Signal Generation

The transmitter circuit diagram is shown in Fig. 4. The circuit has two PPM circuits. The first PPM circuit modulates the positive edge of the 50% duty cycle input clock signal and provides the PPM scheme. The second PPM circuit modulates the negative edge of the input clock signal with respect to the positive edge of the main clock signal. Any PPM circuit can be used in the transmitter circuit design. However the hierarchal PPM circuit proposed by the authors in [12] and shown in Fig. 5 has been used in the transmitter circuit design because of its advantages in terms of power consumption, single-cycle-latency and circuit simplicity. As shown in Fig. 5 the number of stages is equal to the number of bits with the delay assignment as explained in section II. A0 represents the LSB and AN-1 represents the MSB. The circuit diagram indicates that an n-bit PPM circuit can be used as a one bit, or up to n bits PPM circuit by choosing the appropriate output signal as shown in Fig. 5. Each PPM output signal is fed to an AND gate with its reference clock (clk out) as shown in Fig. 4 in order to fix the position of the non-modulated edge in each signal. The D-type flip-flop shown in Fig. 4 combines the output signals of the PPM circuits to generate a data signal, which has both edges modulated independently, where (CP) is the rising-edge-triggered clock, (CD) is the active high clear input and (Data) is the input data.

Figure 4
figure 4

The DTS transmitter block diagram.

Figure 5
figure 5

The PPM circuit diagram.

4.2 Transmitted Signal Generation

The OR gate shown in Fig. 4 combines the data signal with the clock pulse signal and provides the transmitted signal. Figure 6 shows the timing diagram of the signals that have been marked in Fig. 4, in a single-ended eye diagram representation. The figure shows the generation of the transmitted signal from the input clock signal. Figure 6(a) and (b) show the input and inverted clock signals respectively. Figure 6(c) and (d) indicate the PPM output signals and show that both edges have been modulated in each PPM circuit. Figure 6(e) and (f) show the PPM output signal after the AND gate. They indicate that each signal has one modulated edge and the other edge is not modulated. Figure 6(g) shows the flip-flop output signal, which indicates that both the positive edge and the negative edge have been modulated independently and combined in one signal. Figure 6(h) presents the reference clock pulse signal, which is generated from the main clock signal. Figure 6(i) shows the DTS transmitted signal with the clock pulse embedded before the pre-emphasis circuit. The transmitter circuit is followed by a pre-emphasis circuit in order to compensate the channel attenuation.

Figure 6
figure 6

The single-ended eye diagram of the signals shown in Fig. 4.

5 The Receiver Circuit

The receiver circuit is shown in Fig. 7. The first stage is a comparator stage that detects and amplifies the received signal. The second building block is a separation circuit, which is shown in Fig. 8. The separation circuit separates the reference clock pulse and the data pulse from the received signal in order to recover the transmitted codes. The circuit consists of two JK-flip flops working in the toggle mode. J and K terminals are connected to a logic “1” as shown in Fig. 8. A pulse-width increase circuit is used to increase the duty cycle of the clock pulse signal in order to guarantee correct operation of the receiver TDC circuitry. It should be noted that the falling edge of the widened signal is not used in the recovery, however widening the pulse avoids signal dispersion. It is recommended to increase the duty cycle as close to 50% as possible but it does not have to be equal to an exact value.

Figure 7
figure 7

The receiver block diagram.

Figure 8
figure 8

The separation circuit.

The time difference calibration circuits shown in Fig. 7 are used before each TDC circuit in order to calibrate the time difference between the widened clock signal and the data signal. The TDC-1 circuit converts the time difference between the rising edge of the widened clock pulse signal and the rising edge of the data signal into a binary code N1 corresponding to the transmitted code. The TDC-2 circuit converts the time difference between the rising edge of the widened clock pulse signal and the rising edge of the inverted data signal into a binary code N2 corresponding to the transmitted code, where N1 is the number of bits that have been transmitted by modulating the positive edge of the input clock signal and N2 is the number of bits that have been transmitted by modulating the negative edge of the input clock signal.

The function of the TDC is to measure the time difference between the rising/falling edges of the reference clock and the data signal right after the separation circuit. The TDC decodes the time difference into a binary sequence that represents the original data encoded at the transmitter. In order to achieve the required throughput of the data link, the TDC architecture must guarantee the completion of data conversion in one cycle or over multiple cycles in a pipelined fashion. Pipelined TDC implementations, such as flash TDCs [13, 14], are usually associated with a wide range of time resolutions where coarse and fine resolution TDC blocks are run in parallel.

For the data link design in this paper, the TDC deals with much smaller time difference measurements than flash TDCs. As such, a fine resolution TDC design is needed. The TDC that has been used in the receiver design is a modified version of a single-cycle-latency circuit described by the authors in [15] and shown in Fig. 9. The architecture is single cycle and low in power consumption. The TDC’s operation is split into two stages, namely, signal propagation and data decoding. Both stages run in parallel, with the signal-generation segment processing the encoded data edges while the decoder segment continuously compares the processed data signal edges with the reference clock edge. The selection of the delay values assigned to each delay element was intended to cover all possible delay iterations but in a hierarchical fashion. The XOR gates between different levels in the delay hierarchy provide the function of creating signal transition history before and after every delay element. The circuit has been modified from that presented in [15] by using the data signal as a trigger clock for the D-type flip-flops and using the widened clock signal as the input data to the same flip-flops. Also the multiplexer connections have been modified. The modification has been made because the data signal pulse width for some codes is not wide enough to propagate as the data signal to the XOR gate since the XOR gate output signal pulse width equals to half of the XOR gate input signal pulse width. However the clock signal is wide enough to propagate. At any level during signal propagation, the decoder uses the reference clock edge, the processed data at the current hierarchy level and the decoded data bit from the previous hierarchy level to decode the bit at current hierarchy level.

Figure 9
figure 9

The modified TDC circuit.

Figure 10 shows the timing diagram of the signals at different points in the receiver circuit. Figure 10(a) shows the timing diagram of the received signal after the buffer stage. Figure 10(b) is the data signal after separation, and Fig. 10(c) shows the reference clock pulse signal after separation from the received signal. Figure 10(d) shows the widened clock signal at the output of the pulse width increase circuit. The transmitted codes can be recovered by converting the time difference between signal edges into binary codes.

Figure 10
figure 10

The eye diagram of the separation circuit output signals and the widened clock signal.

6 3Gb/s CMOS DTS Link Design

6.1 Link Circuitry

To illustrate the nature of a monolithic CMOS implementation, a 6-bit 3 Gb/s DTS link has been designed and simulated using Cadence tools in a mixed signal 90 nm CMOS process. The input bit rate is 500 Mb/s for each bit. The DTS link uses an input clock frequency of 500 MHz. The transmitter and the receiver have been designed as explained in section II. Figure 11 shows the circuit diagram of the pre-emphasis circuit that has been used in the link design. The circuit consists of four differential pair stages. The first stage is the driver stage and the three other stages are the tap stages. The buffers indicated in Fig. 11 are used to delay the input signal before the tap stages. The buffers used in the circuit design are taken from the available buffers in the Cadence library. The resistance R is set to 50 Ω to match the pre-emphases circuit to the 50 Ω FR-4 channel that will be used to illustrate the link behavior. Figure 12 shows the circuit diagram of the comparator circuit, which consists of four differential amplifier stages. The inputs are terminated in 50 Ω in order to match the input impedance of the comparator circuit to the FR-4 channel impedance. RL is set to 1.1 K ohms.

Figure 11
figure 11

The 3-tap pre-emphasis circuit diagram.

Figure 12
figure 12

The comparator circuit.

6.2 Channel Modeling

A 40-inch FR4 channel has been used as a transmission media for the designed link and an S-parameter table has been generated using ADS tools and used in Cadence to simulate the channel. The wire bonding and chip pad equivalent circuits have been taken into account in the simulation. Figure 13 shows the S11 and S21 curves of the channel using Cadence tools.

Figure 13
figure 13

The S11 and S21 curves of the FR-4 channel used in the link design.

7 Simulation Results

7.1 3 Gb/s DTS Link Simulation Results

In the 6 bits 3 Gb/s link, N1 and N2 have been chosen as \( N1 = N2 = \frac{N}{2} = 3 \). Since T = 2 ns, the designed time budget values have been chosen as T P , T min and T D  = 250 ps. Then T PPM1 = T PPM2  = 0.5 ns. The time resolution of the link is then τ = 62.5 ps. Figure 14 shows the transmitter waveforms. Signal numbers are indicated in the transmitter circuit diagram shown in Fig. 4. The waveforms match those indicated in Fig. 6 and indicate that the rising and the falling edges of the data pulse are spacing equally as designed. The waveforms 9 and 10 in Fig. 14 show the transmitted signal at the input and the output of the 3-tap pre-emphases circuit, respectively.

Figure 14
figure 14

The simulated signals that are indicated in the transmitter circuit diagram in Fig. 4 from 1 to 10. Time is represented in nsec.

Figure 15(a) shows the transmitted signal at the output of the 40-inch FR4 channel and before the comparator circuit. Figure 15(b) shows the received signal at the output of the comparator circuit. The signal edges are spaced with the same time spacing as designed. Figure 15(c) and (d) show the separation circuit output signals. Figure 15(c) presents the eye diagram of the data signal and Fig. 15(d) indicates the eye diagram of the clock pulse signal. After separating the signals, the TDC circuits convert the time difference between signal edges into binary codes.

Figure 15
figure 15

The eye diagram of the received signal and the separation circuit output signals. Time is represented in nsec.

The signal shown in Fig. 15(b) indicates an amount of jitter on the falling edge of the reference clock pulse. This is because of the ISI that occurs due to having different spacings between the reference clock pulse and the data pulse, which can be noticed by looking at the falling edge of the reference clock pulse in Fig. 15(a). Because the falling edge of the reference clock pulse is not used in recovering the transmitted bits, this jitter does not affect the recovery process. Regarding other small jitters that appear in the signal edges, they are tolerated by the TDC circuit as explained in [10].

The energy concentration of the transmitted signal has been calculated for a 3Gb/s and 4Gb/s DTS link in different bandwidths. The results are compared to the proposed architecture in [10] for the same link rate in Table 2. The table indicates that the proposed link concentrates the transmitted signal energy in a similar low bandwidth to the PPM-TDC link, however the DTS can transmit more bits because of modulating both edges of the input clock signal.

Table 2 The percentage energy concentration of the transmitted signal in different bandwidths.

Table 3 shows a comparison between the proposed DTS link with other serial links in terms of power consumption, the required input clock frequency and chip area.

Table 3 A comparison between the DTS link and the SerDes link in terms of the chip area and the power consumption.

The DTS link uses a lower input clock frequency signal compared to the embedded clock SerDes architecture, at the same data rate. Although the interleaving SerDes architecture uses the same input clock frequency signal as in the DTS link for the same data rate, it requires a very precise external clock signal and strict timing between the clock signals and data signals, which makes the design very challenging and very complex resulting in more cost, area and power consumption compared to other SerDes architectures. It should be noted that the interleaving SerDes transmitted signal, like other SerDes architectures occupies more bandwidth than the DTS transmitted signal.

Table 4 shows a comparison between the PPM-TDC link and the proposed link in terms of the link resolution, required input clock frequency and number of channels required for transmission. The table indicates that the proposed link relaxes the circuit design because it uses higher values of τ than the PPM-TDC link. Also, the DTS link uses only one transmission channel. The calculation of the DTS link resolution τ for the 3 Gb/s link, which is indicated in Table 4, is based on the assumption that T P = T D = T min  = 250 ps. In the case of 4 Gb/s and 6Gb/s links, the values for T P , T D and T min are reduced to 150 ps in order to fit more bits into the link to achieve the required link rate for the comparison.

Table 4 A comparison between the PPM-TDC link and the proposed link in terms of the link resolution (τ) and number of transmission channels.

Table 5 shows the power consumption of each block in the proposed DTS link.

Table 5 The power consumption of each block in the 3 Gb/s DTS link.

7.2 Jitter Calculations

The effects of jitter on the transmitted signal have been significantly reduced in the DTS link. The first portion of the jitter is the clock jitter. Since the data pulse is generated from the clock pulse, the clock is carried over to the data pulse and the time difference between the clock and data edges remains the same as shown in Fig. 16.

Figure 16
figure 16

Clock jitter cancellation in the transmitted signal. The top graph is the main clock signal and the lower graph is the transmitted signal.

The second portion of the jitter is caused by the noise generated by the link circuitry. To illustrate, the jitter generated from the circuit noise in delay lines needs to be calculated. In [21] Abidi provided an expression to calculate the mean and the variance of the propagation time in a CMOS standard inverter as shown in Eqs. 3 and 4:

$$ {t_p} = \frac{{{C_L}{V_{{dd}}}}}{{2{I_d}}} $$
(3)
$$ {\sigma^2} = \frac{{4KT\gamma {g_m}{t_p}}}{{2I_d^2}} $$
(4)

where C L is the load capacitor, V dd is the supply voltage, I d is the drain current and \( 4KT\gamma {g_m} \) is the long channel drain noise current spectral density of a CMOS device in saturation. The load capacitance can be calculated as given in the following equations [22].

$$ {C_L} = {C_{{self}}} + {C_{{fan - out}}} + {C_{{wire}}} $$
(5)
$$ {C_{{self}}} = {C_{{DB,N}}} + { }2{C_{{OL,N}}} + { }{C_{{DB,P}}} + { }2{C_{{OL,P}}} $$
(6)
$$ {C_{{fan - out}}} = { }{C_{{GS,N}}} + { }2{C_{{OL,N}}} + { }{C_{{GS,P}}} + { }2{C_{{OL,P}}} $$
(7)

where C GS is the gate-to-source capacitance, C OL is the gate-drain overlap capacitance and C DB is the drain-bulk junction capacitor. The N and P subscripts refer to the N-transistor and the P-transistor of the standard inverter circuit. Wire capacitance C wire is the capacitance per unit area of the metal interconnect. In [22] the author performed a comparison between theoretical and simulated jitter in a CMOS single-stage standard inverter using the above equations and showed close results. In order to calculate the jitter variance in a delay line, the jitter variance is simply the sum of the jitter variances of each inverter stage. In this work, a comparison between the theoretical and simulated jitter of a 125 ps delay line, which consists of six inverters taken from the Cadence library, has been performed. The simulated jitter variance is 1.10641E-25 s2 and the calculated jitter variance is 1.11E–25 s2, which shows good agreement.

In order to model the overall transmitted signal jitter variance, it should be noted that the data pulse is propagated through a different circuit than the reference clock pulse. Since the information is stored in the time difference between the data pulse edges and the reference clock pulse edge, the effective jitter needs to be examined. The effective jitter is defined as the jitter of the time difference between the positive / negative edge of the transmitted data pulse and the positive edge of the reference clock pulse.

In the DTS link, the jitter variance due to the circuit noise in the transmitted signal changes with the input code, since the transmitted signal does not propagate through all delay lines for all codes. The jitter variance of the transmitted signal edges for the worst case (when the input code is all ones) has been simulated, including noise from the Mux circuits and the flip-flops. The results are shown in Table 6.

Table 6 The jitter variance and RMS values of the transmitted signal edges.

The RMS jitter value is very small compared to the link resolution (62 ps), which means that the DTS link can tolerate the jitter generated by the circuit noise.

7.3 Monte-Carlo and Corners Simulation

Monte-Carlo simulations for the 3 Gb/s DTS link have been performed using Cadence. The process variation as well as mismatch is included in the simulations. The simulated variance of the data pulse edge jitter is 1.065E–21 s2, which indicates an RMS jitter of 32.6 ps. The RMS jitter approximately equals to half of the link resolution, which is 62 ps. As a result, the TDC can tolerate this amount of jitter as will be explained later in this section. The jitter of the reference clock pulse positive edge has been simulated as well. The jitter variance is 1.34E–21 s2 and the RMS value is 36 ps, which shows that with the process variation and mismatch, the RMS jitter that occurs at the clock edge is close to the RMS jitter that occurs at the data edge. Time difference between the reference clock edge and the data edge remains close to the designed value. In the proposed link, the information is stored in the time difference between the positive edge of the reference clock pulse and both edges of the data pulse. As a result, the time difference between the positive edge of the clock pulse and the positive edge of the data pulse has been simulated over 20 different Monte-Carlo simulations for all codes and the results are shown in Fig. 17. The figure indicates that the time difference variance of each code is separated from other codes allowing proper data recovery in practical circuits where process variation results in different skew / jitter in different data paths.

Figure 17
figure 17

The time difference between the positive edge of the reference clock pulse and the positive edge of the data pulse with 16 different Monte-Carlo simulations for all codes.

The corners simulations for the 3 Gb/s DTS link have been carried out to study the effect of process variations on the designed values of the delay lines. Table 7 shows a comparison of the timing values at each corner, where T P , T min1 , T PPM1 , T D , T PPM2 , T min2 and link resolution are shown in Fig. 2(d). From Table 7, it is concluded that the pulse widths and spacing change with process variation. They are affected by temperature and supply voltage variations as well. The delay line values need to be controlled over the process, voltage and temperature (PVT) variations. A delay-locked-loop (DLL) circuit has been proposed by the authors in [12] in order to fix the delay line values of the proposed PPM circuit against PVT variations. Using the DLL circuit in both the transmitter and the receiver sides brings all the designed values to the typical-typical (TT) values. The DLL circuit tracks any changes in the delay values and corrects them in one clock cycle.

Table 7 The designed values of the 3Gb/s DTS link in different corners.

In the mean time, the TDC circuit tolerates any small mismatches between the delay values. Figure 18 shows the TDC output code vs the input time difference. For instance, for the code ‘010’ to be recognized as ‘010’, the time difference between the rising edge of the clock and the rising edge of the data signal must be between the two values 98 ps and 151 ps as shown in Fig. 18 while the designed time difference value is 125 ps.

Figure 18
figure 18

The output code vs input time difference for the designed TDC circuit.

8 Bit-Error-Rate Simulation

The bit-error-rate (BER) due to an adaptive white Gaussian noise (AWGN) distribution of the 3Gb/s DTS link has been simulated using Matlab as explained in [23]. Figure 19 shows the BER of the DTS link with different values of the (signal-to-noise ratio) SNR compared with the BER presented in [24] for a 1.25 Gb/s serial link and the BER presented in [25] for a 3.125 Gb/s serial link. The figure shows an improved BER compared to others.

Figure 19
figure 19

The simulated bit error rate vs the signal-to-noise ratio of the 3 Gb/s DTS link.

Another factor impacting signal quality in serial links is crosstalk. Crosstalk arises from coupling on the printed-circuit board (PCB), both in the line card and the backplane, inside the package and within the backplane connectors. The amount of crosstalk depends on signal amplitudes, signal spectrum, and trace/cable length that could lead to either delayed or advanced edges. In high-speed serial links, the designers use minimum transmitter amplitude to achieve reliable bit-error-rate (BER) operation of a system, which reduces the effects of crosstalk. In the proposed DTS architecture, the crosstalk has been significantly reduced because the reference clock pulse and the data pulse are transmitted in the same signal over one wire or trace. As a result any delay applied to the transmitted signal will keep the time difference between signal edges constant.

9 Measurements

As a proof of concept, 4 bits 700 Mb/s and 4 bits 1.2 Gb/s DTS-based links have been set up using a commercial FPGA board. The 700 Mb/s link has been designed using the discrete logic elements in the FPGA board. Using the discrete elements, the link rate can not be pushed higher than 700 Mb/s because of the frequency limitations of the switching components in the FPGA board. The SerDes circuits in the same board have been used to generate the transmitted signal of the 1.2 Gb/s DTS link in order to achieve higher link rate as will be explained in this section.

9.1 The 700 Mb/s DTS Link

9.1.1 The Transmitter Circuit

The transmitter circuit diagram of a 4 bit 700 Mb/s DTS-based link is shown in Fig. 20. The circuit consists of two PPM circuits to modulate both the positive and the negative edges of the clock signal independently, a D-type flip-flop to combine the PPM-1 output signal and PPM-2 output signal into one data signal and an OR gate to embed the reference clock pulse signal on the data signal and generate the transmitted signal. The input data bits B0 and B1 modulate the rising edge of each clock pulse and the input data bits B2 and B3 modulate the falling edge of the input clock signal using an input clock signal of 200 MHz. In the presented link the delay values of the PPM circuits have not been chosen hierarchically, they have been chosen according to the availability of the delay values in the FPGA board. The PPM circuit used in the link implementation has been published by the authors in [12]. Figure 21 shows the eye diagram of the designed transmitted signal with the designed time spacing values indicated.

Figure 20
figure 20

The transmitter block diagram.

Figure 21
figure 21

The eye diagram of the designed transmitted signal.

9.1.2 The Receiver Circuit

The receiver circuit is shown in Fig. 22. The circuit uses two counters in order to separate the transmitted signal edges into two signals to start the recovery process, using TDC circuits that translate the time difference between signal edges into binary code corresponding to the transmitted pattern. The calibration circuit is used to calibrate the time difference between the TDC input signal edges. The calibration process is performed once before starting the transmission process. Two TDC circuits have been used to build the receiver circuit. TDC_1 is used to recover the transmitted bits B0 and B1, and the TDC_2 circuit is used to recover the transmitted bits B2 and B3. Figure 23 shows the timing diagram of signals at different points in the receiver circuit. The times T1 and T3 represent the time differences between the positive edge of the clock pulse and the positive edge of the data pulse. T2 and T4 represent the time difference between the positive edge of the clock pulse and the negative edge of the data pulse.

Figure 22
figure 22

The receiver circuit block diagram.

Figure 23
figure 23

The timing diagram of the received signal, counter_1 and counter_2 output signals.

Figure 24 shows the circuit diagram of the TDC circuit used in the link implementation. A differential scheme such as presented in [26] has been used to build the TDC circuit. The decoder decodes the flip-flop outputs to indicate the transmitted bits.

Figure 24
figure 24

The TDC circuit diagram.

9.1.3 Measurement Results

A 4 bits 700 Mb/s DTS-based link with an input bit rate of 175 Mb/s for each bit has been set up with an input clock frequency of 175 MHz. The measurement setup is shown in Fig. 25, using two Altera transceiver signal integrity development FPGA kits. The transmitter output has been applied to an output buffer stage and then to a 40-inch FR4 channel. The channel output has been applied to an input buffer stage and then to the receiver circuit as shown. The FPGA board has six serializer/deserializer (SerDes) links. When the SerDes link is used in the reverse CDR mode, the board bypasses all the link blocks and allows using the input and the output buffer stages. Two SerDes links have been used as indicated. One has been used as an output buffer stage and the other has been used as an input buffer stage. Two FPGA boards have been used in the measurement set up because one board does not have enough input/output terminals.

Figure 25
figure 25

The measurement set up block diagram.

Figure 26 shows the measured eye diagram of the data signal before the clock is embedded. The diagram shows four positive edges, which represent the four possible logic combinations of two binary bits. The four negative edges represent the four possible logic combinations of the other two bits. The time spacings between positive and negative edges are indicated. Figure 27 shows the reference clock pulse signal before merging it with the data signal. The period and the pulse width are indicated. Figure 28 indicates the transmitted signal after embedding the reference clock pulses. The figure indicates the values for T PPM1 , T PPM2 , T P , T D and T min .

Figure 26
figure 26

The measured eye diagram of the data signal of 700 Mb/s DTS link.

Figure 27
figure 27

The measured eye diagram of the reference clock pulse of the 700 Mb/s DTS link.

Figure 28
figure 28

The measured eye diagram of the transmitted signal of the 700 Mb/s DTS link.

Figure 29 shows the eye diagram of the received signal at the end of the 40-inch FR4 channel. The figure indicates that the time difference between edges is still the same after transmission. The figure also indicates that the reference clock pulse has jitter due to the noise generated from the switching elements in the FPGA board. This jitter does not affect the recovery process since the amount of jitter is much smaller than the time spacing between the data pulse edges. The received signal has been detected and the transmitted bits have been recovered successfully.

Figure 29
figure 29

The measured eye diagram of the received signal of the 700 Mb/s DTS link after the 40-inch FR-4 channel.

9.2 The 1.2 Gb/s DTS Link

In the FPGA board, the logic circuits cannot work at frequencies higher than the given example. To provide another example with higher link rate, the SerDes transmitter with 2-tap pre-emphases circuit, which is built into the FPGA board, has been used to create the transmitted signal of a 4 bits 1.6 Gb/s DTS-based link as shown in Fig. 30. Four counters and a decoder have been used to generate the required pattern of 32 bits for the SerDes transmitter, which gives the possible combinations required to generate the designed transmitted signal. Figure 31 shows the eye diagram of the transmitted signal generated from the circuit shown. The figure indicates the values for T P , T D , T min and the link resolution τ. T PPM1 = T PPM2 = 3τ. The transmitted signal has been applied to a 40-inch FR4 channel. The received signal at the end of the channel has been detected and is shown in Fig. 32. The figure shows that the signal edge spacings are the same as transmitted. The transmitted bits are successfully recovered.

Figure 30
figure 30

The 1.6 Gb/s transmitter circuit.

Figure 31
figure 31

The eye diagram of the transmitted signal of the 1.6 Gb/s link.

Figure 32
figure 32

The eye diagram of the received signal of the 1.6 Gb/s link after the 40-inch FR-4 channel.

10 Conclusions

A new differential-time-based architecture modulating both the rising and the falling edges of the input clock signal with the clock embedded in the transmitted signal to achieve high data rates is presented in this paper. The presented link takes advantage of improved time resolution in scaled CMOS technology by modulating the input data in time rather than multiplexing the input data as in other architectures. In the DTS link, the data signal is generated from the clock signal and a reference clock pulse is embedded in the transmitted signal. As a result, the jitter effect has been significantly reduced. The proposed link concentrates the transmitted signal energy in a low bandwidth and does not use a very high input clock frequency. The design steps and the link circuitry for a 6 bits 3 Gb/s DTS link have been provided. The link has been simulated using Cadence tools in a 90 nm mixed signal CMOS process and the simulated results have been presented. The total power consumption of the simulated DTS link is 55 mW. The thermal noise induced jitter calculations and simulations for a delay line have been presented. The effective RMS jitter of the time difference between the positive edge of the reference clock pulse and the data pulse edges are 1 ps and 1.3 ps. Mont-Carlo and process corners simulations have been performed to the simulated DTS link to study the effect of the process and mismatch on the proposed link. The BER vs SNR simulation has been simulated and compared to other serial links, it showed better results.

700 Mb/s and 1.6 Gb/s DTS links have been designed and implemented on FPGA board. Two different ways to design the transmitter have been explained. From the measured eye diagrams, the time spacings remain the same over the transmission distance and the transmitted codes have been recovered successfully at the receiver side.